SlideShare a Scribd company logo
1 of 92
Planning under Uncertainty with Markov Decision Processes: Lecture II Craig Boutilier Department of Computer Science University of Toronto
Recap ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dimensions of Abstraction (recap) A B C A B C A B C A B C A B C A B C A B C A B C A  A B C A B   A B C A B C = Uniform Nonuniform Exact Approximate Adaptive Fixed 5.3 5.3 5.3 5.3  2.9 2.9   9.3 9.3   5.3 5.2 5.5 5.3  2.9 2.7 9.3 9.0
Classical Regression ,[object Object],[object Object],[object Object],G  G C  C do(a)
Example: Regression in SitCalc ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Decision-Theoretic Regression ,[object Object],[object Object],[object Object]
Decision-Theoretic Regression ,[object Object],[object Object],[object Object],[object Object],Q t (a) V t-1 G 2 G 3 G 1 C 1 p1 p2 p3
Functional View of DTR ,[object Object],[object Object],CR M -10 0 V t-1 f Rm (Rm t, Rm t+1 ) f M (M t, M t+1 ) f T (T t, T t+1 ) f L (L t, L t+1 ) f Cr (L t, Cr t, Rc t, Cr t+1 ) f Rc (Rc t, Rc t+1 ) T t L t CR t RHC t T t+1 L t+1 CR t+1 RHC t+1 RHM t RHM t+1 M t M t+1
Functional View of DTR ,[object Object],Q a t (Rm t ,M t ,T t ,L t ,Cr t, Rc t ) = R +   Rm,M,T,L,Cr,Rc(t+1)   Pr a (Rm t-1 ,M t-1 ,T t-1 ,L t-1 ,Cr t-1, Rc t-1  | Rm t ,M t ,T t ,L t ,Cr t, Rc t )  * V t-1 (Rm t-1 ,M t-1 ,T t-1 ,L t-1 ,Cr t+1, Rc t-1 ) = R +   Rm,M,T,L,Cr,Rc(t+1)   f Rm (Rm t, Rm t-1 )  f M (M t, M t-1 )  f T (T t, T t-1 )  f L (L t, L t-1 )  f Cr (L t, Cr t, Rc t, Cr t-1 )  f Rc (Rc t, Rc t-1 )   V t-1 (M t-1 ,Cr t-1 ) = R +   M,Cr,Rc(t+1)   f M (M t, M t-1 )  f Cr (L t, Cr t, Rc t, Cr t-1 )   V t-1 (M t-1 ,Cr t-1 ) =  f (M t ,L t ,Cr t, Rc t )
Functional View of DTR ,[object Object],[object Object],[object Object],[object Object],[object Object]
Planning by DTR ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Structured Value Iteration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Structured Policy and Value Function DelC BuyC GetU Noop U R W Loc Go Loc HCR HCU 8.36 8.45 7.45 U R W 6.81 7.64 6.64 U R W 5.62 6.19 5.19 U R W 6.10 6.83 5.83 U R W HCR HCU 9.00 W 10.00 Loc Loc
Structured Policy Evaluation: Trees ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A Simple Action/Reward Example X Y Z X Y Z X Y 0.9 0.0 X 1.0 0.0 1.0 Y Z 0.9 0.0 1.0 Z 10 0 Network Rep’n for Action  A Reward Function  R
Example: Generation of V 1 V 0  = R Z 0 10 Y Z Z: 0.9 Z: 0.0 Z: 1.0 Step 1 Y Z 9.0 0.0 10.0 Step 2 Y Z 8.1 0.0 19.0 Step 3: V 1
Example: Generation of V 2 Y Z 8.1 0.0 19.0 V 1 Step 1 Step 2 Y X Y Z Y: 0.9 Z: 0.9 Y: 0.9 Z: 0.0 Y:0.9 Z: 1.0 Z Y: 1.0 Y: 0.0 Z: 0.0 Y:0.0 Z: 1.0 X Y Y: 0.9 Y: 0.0 Y: 1.0
Some Results: Natural Examples
A Bad Example for SPUDD/SPI Action a k  makes X k  true; makes X 1 ... X k-1  false; requires X 1 ... X k-1  true Reward: 10 if all X 1  ... X n  true (Value function for n = 3 is shown)
Some Results: Worst-case
A Good Example for SPUDD/SPI  Action a k  makes X k  true; requires X 1 ... X k-1  true Reward: 10 if all X 1  ... X n  true (Value function for n = 3 is shown)
Some Results: Best-case
DTR: Relative Merits ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Approximate DTR ,[object Object],[object Object],[object Object],[object Object]
A Pruned Value ADD 8.36 8.45 7.45 U R W 6.81 7.64 6.64 U R W 5.62 6.19 5.19 U R W HCR HCU 9.00 W 10.00 Loc [7.45,   8.45] Loc HCR HCU [9.00, 10.00] [6.64, 7.64] [5.19, 6.19]
Approximate Structured VI ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Approximate DTR: Relative Merits ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
First-order DT Regression ,[object Object],[object Object],[object Object],[object Object]
SitCal: Domain Model (Recap) ,[object Object],[object Object],[object Object],[object Object]
Axiomatizing Causal Laws (Recap)
Stochastic Action Axioms (Recap) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Specifying Objectives (Recap) ,[object Object]
First-Order DT Regression: Input ,[object Object],[object Object],[object Object],[object Object],[object Object], t.On(B,t,s) : 10  t.On(B,t,s) : 0 load(b,t) loadS(b,t) : On(b,t) loadF(b,t) :  ----- Rain ¬Rain 0.7  0.9 0.3  0.1
First-Order DT Regression: Output ,[object Object],[object Object],[object Object],[object Object],[object Object]
Step 1 ,[object Object],A. B. C. D.
Step 2 ,[object Object],[object Object],[object Object],A: LoadS, pr =0.7,val=10 D: LoadF, pr =0.3,val=0
Step 2: Graphical View  t.On(B,t,s) : 10  t.On(B,t,s) : 0  t.On(B,t,s) & Rain(s)  & b=B & loc(b,s)=loc(t,s)  t.On(B,t,s) (  b=B v   loc(b,s)=loc(t,s)) &   t.On(B,t,s)  t.On(B,t,s) &   Rain(s)  & b=B & loc(b,s)=loc(t,s) 10 7 9 0 1.0 0.7 0.1 0.9 0.3 1.0
Step 2: With Logical Simplification
DP with DT Regression ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Intra-action Maximization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Intra-action Maximization Example
Inter-action Maximization ,[object Object],[object Object]
FODTR: Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
FODTR: Implementation ,[object Object],[object Object],[object Object],[object Object]
Example Optimal Value Function
Benefits of F.O. Regression ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Function Approximation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Linear Function Approximation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Flexibility of Linear Decomposition ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Linear Approx: Components ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Approximate Value Iteration ,[object Object],[object Object],[object Object],[object Object],[object Object]
Projection ,[object Object],[object Object],[object Object],[object Object]
Projection as Linear Program ,[object Object],[object Object],[object Object],[object Object],Vars:  w 1 , ..., w k ,   Minimize:   S.T.       V(s) – A w (s) ,   s       A w (s) - V(s) ,   s    measures max norm difference between V and “best fit”
Approximate Value Iteration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Factored MDPs ,[object Object],[object Object],[object Object],[object Object]
Assumptions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],X 1 X’ 1 X 2 X 3 X’ 2 X’ 3 R(X 1 X 2 X 3 )  = R 1 (X 1 X 2 ) + R 2 (X 3 )
Factored AVI ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Compactness of Bellman Backup ,[object Object],[object Object]
Compactness of Bellman Backup ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Factored Projection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Variable Elimination ,[object Object],[object Object],max X 1 X 2 X 3 X 4 X 5 X 6  {  f 1 (X 1 X 2 X 3 )  +  f 2 (X 3 X 4 )  +  f 3 (X 4 X 5 X 6 )  } Elim X 1 : Replace  f 1 (X 1 X 2 X 3 )  with  f 4 (X 2 X 3 )  = max X 1  {  f 1 (X 1 X 2 X 3 )  } Elim X 3 : Replace  f 2 (X 3 X 4 )  and  f 4 (X 2 X 3 )  with  f 5 (X 2 X 4 )  = max X 3  {  f 1 (X 1 X 2 X 3 )  +  f 4 (X 2 X 3 )  } etc. (eliminating each variable in turn until maximum value is computed over entire state space)
Factored Projection: Factored LP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Vars:  w 1 , ..., w k ,   Minimize:   S.T.       V(s) – A w (s) ,   s       A w (s) - V(s) ,   s
Factored Projection: Factored LP ,[object Object],[object Object],[object Object],[object Object]
Factored Projection: Factored LP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Factored Projection: Factored LP ,[object Object],[object Object],u(f j ,z 1 ,...,z n ) = f j (z 1 ,...,z n ; w)   ,   z 1 ,...,z n
Factored Projection: Factored LP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],u(g k ,z 1 ,...,z n )     g k1 (z 1 ,...,z n1 ) +   g k1 (z 1 ,...,z n1 )+ ... ,   x k ,  z 1 ,...,z n
Factored Projection: Factored LP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],      u final ()       max {  f j ( z j  ;  w)   ,  x  X } = max {V(s) – A w (s) , s  S}
Some Results [GKP-01] ,[object Object],[object Object],[object Object]
Some Results [GKP-01] Computation Time
Some Results [GKP-01] Computation Time
Some Results [GKP-01] Relative error wrt optimal VF (small problems)
Linear Approximation: Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
An LP Formulation ,[object Object],[object Object],[object Object],Vars: V(s) Minimize:   s  V(s)  S.T.  V(s)    (L a V)(s) ,   a,s
Using Structure in LP Formulation ,[object Object],[object Object],[object Object],[object Object]
Good Basis Sets ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parallel Problem Decomposition ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],MDP1 MDP2 MDP3
Generating SubMDPs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Generating SubMDPs Dynamic Bayes Net over Variable Set
Generating SubMDPs Green SubMDP (subset of variables)
Generating SubMDPs Red SubMDP (subset of variables)
Composing Solutions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search-based Composition ,[object Object],s2 a1 s3 a1 a2 a2 a1 s4 a2 s5 s1 p2 p2 p3 p4 Exp Exp Max
Search-based Composition ,[object Object],[object Object],[object Object],s2 a1 s3 a1 a2 a2 a1 s4 a2 s5 s1 p2 p2 p3 p4 Exp Exp Max V(s) <= f 1 (s)  +  f 2 (s) + ... +  f k (s) V(s) >= max { f 1 (s) ,  f 2 (s) , ...  f k (s)  }
Offline Composition ,[object Object],[object Object],[object Object],[object Object],[object Object]
Wrap Up ,[object Object],[object Object],[object Object]
Other Techniques ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Extending the Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
References (con’t) ,[object Object],[object Object],[object Object]
References (con’t) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
References (con’t) ,[object Object],[object Object],[object Object]

More Related Content

What's hot

Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of Multipliers
Taiji Suzuki
 
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
Zheng Mengdi
 

What's hot (20)

Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of Multipliers
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
A new transformation into State Transition Algorithm for finding the global m...
A new transformation into State Transition Algorithm for finding the global m...A new transformation into State Transition Algorithm for finding the global m...
A new transformation into State Transition Algorithm for finding the global m...
 
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systemsAdaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Merged Talk: A Verified Optimizer for Quantum Circuits & Verified Translation...
Merged Talk: A Verified Optimizer for Quantum Circuits & Verified Translation...Merged Talk: A Verified Optimizer for Quantum Circuits & Verified Translation...
Merged Talk: A Verified Optimizer for Quantum Circuits & Verified Translation...
 
Feedback Particle Filter and its Applications to Neuroscience
Feedback Particle Filter and its Applications to NeuroscienceFeedback Particle Filter and its Applications to Neuroscience
Feedback Particle Filter and its Applications to Neuroscience
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Bayesian Estimation For Modulated Claim Hedging
Bayesian Estimation For Modulated Claim HedgingBayesian Estimation For Modulated Claim Hedging
Bayesian Estimation For Modulated Claim Hedging
 
Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
 
Second Order Perturbations During Inflation Beyond Slow-roll
Second Order Perturbations During Inflation Beyond Slow-rollSecond Order Perturbations During Inflation Beyond Slow-roll
Second Order Perturbations During Inflation Beyond Slow-roll
 
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
 
Adc
AdcAdc
Adc
 
Rdnd2008
Rdnd2008Rdnd2008
Rdnd2008
 

Viewers also liked

Monte carlo presentation for analysis of business growth
Monte carlo presentation for analysis of business growthMonte carlo presentation for analysis of business growth
Monte carlo presentation for analysis of business growth
Asif Anik
 
Lucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolLucio marcenaro tue summer_school
Lucio marcenaro tue summer_school
Jun Hu
 

Viewers also liked (10)

CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
 
Regret-Based Reward Elicitation for Markov Decision Processes
Regret-Based Reward Elicitation for Markov Decision ProcessesRegret-Based Reward Elicitation for Markov Decision Processes
Regret-Based Reward Elicitation for Markov Decision Processes
 
Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte CarloMonte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
 
for quality month 2014
for quality month 2014for quality month 2014
for quality month 2014
 
Episode 25 : Project Risk Management
Episode 25 :  Project Risk ManagementEpisode 25 :  Project Risk Management
Episode 25 : Project Risk Management
 
Monte carlo presentation for analysis of business growth
Monte carlo presentation for analysis of business growthMonte carlo presentation for analysis of business growth
Monte carlo presentation for analysis of business growth
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
 
A simple example of Earned Value Management (EVM) in action
A simple example of Earned Value Management (EVM) in actionA simple example of Earned Value Management (EVM) in action
A simple example of Earned Value Management (EVM) in action
 
Lucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolLucio marcenaro tue summer_school
Lucio marcenaro tue summer_school
 

Similar to Planning Under Uncertainty With Markov Decision Processes

Discrete form of the riccati equation
Discrete form of the riccati equationDiscrete form of the riccati equation
Discrete form of the riccati equation
Alberth Carantón
 
KAUST_talk_short.pdf
KAUST_talk_short.pdfKAUST_talk_short.pdf
KAUST_talk_short.pdf
Chiheb Ben Hammouda
 

Similar to Planning Under Uncertainty With Markov Decision Processes (20)

Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Fourier_Pricing_ICCF_2022.pdf
Fourier_Pricing_ICCF_2022.pdfFourier_Pricing_ICCF_2022.pdf
Fourier_Pricing_ICCF_2022.pdf
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
residue
residueresidue
residue
 
IRJET- Analytic Evaluation of the Head Injury Criterion (HIC) within the Fram...
IRJET- Analytic Evaluation of the Head Injury Criterion (HIC) within the Fram...IRJET- Analytic Evaluation of the Head Injury Criterion (HIC) within the Fram...
IRJET- Analytic Evaluation of the Head Injury Criterion (HIC) within the Fram...
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Discrete form of the riccati equation
Discrete form of the riccati equationDiscrete form of the riccati equation
Discrete form of the riccati equation
 
KAUST_talk_short.pdf
KAUST_talk_short.pdfKAUST_talk_short.pdf
KAUST_talk_short.pdf
 
A kernel-free particle method: Smile Problem Resolved
A kernel-free particle method: Smile Problem ResolvedA kernel-free particle method: Smile Problem Resolved
A kernel-free particle method: Smile Problem Resolved
 
Lect5 v2
Lect5 v2Lect5 v2
Lect5 v2
 
Online Signals and Systems Assignment Help
Online Signals and Systems Assignment HelpOnline Signals and Systems Assignment Help
Online Signals and Systems Assignment Help
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 3
Unit 3Unit 3
Unit 3
 
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
 
Vectors and Kinematics
Vectors and KinematicsVectors and Kinematics
Vectors and Kinematics
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
AlgorithmAnalysis2.ppt
AlgorithmAnalysis2.pptAlgorithmAnalysis2.ppt
AlgorithmAnalysis2.ppt
 

More from ahmad bassiouny (20)

Work Study & Productivity
Work Study & ProductivityWork Study & Productivity
Work Study & Productivity
 
Work Study
Work StudyWork Study
Work Study
 
Motion And Time Study
Motion And Time StudyMotion And Time Study
Motion And Time Study
 
Motion Study
Motion StudyMotion Study
Motion Study
 
The Christmas Story
The Christmas StoryThe Christmas Story
The Christmas Story
 
Turkey Photos
Turkey PhotosTurkey Photos
Turkey Photos
 
Mission Bo Kv3
Mission Bo Kv3Mission Bo Kv3
Mission Bo Kv3
 
Miramar
MiramarMiramar
Miramar
 
Mom
MomMom
Mom
 
Linearization
LinearizationLinearization
Linearization
 
Kblmt B000 Intro Kaizen Based Lean Manufacturing
Kblmt B000 Intro Kaizen Based Lean ManufacturingKblmt B000 Intro Kaizen Based Lean Manufacturing
Kblmt B000 Intro Kaizen Based Lean Manufacturing
 
How To Survive
How To SurviveHow To Survive
How To Survive
 
Dad
DadDad
Dad
 
Ancient Hieroglyphics
Ancient HieroglyphicsAncient Hieroglyphics
Ancient Hieroglyphics
 
Dubai In 2009
Dubai In 2009Dubai In 2009
Dubai In 2009
 
DesignPeopleSystem
DesignPeopleSystemDesignPeopleSystem
DesignPeopleSystem
 
Organizational Behavior
Organizational BehaviorOrganizational Behavior
Organizational Behavior
 
Work Study Workshop
Work Study WorkshopWork Study Workshop
Work Study Workshop
 
Workstudy
WorkstudyWorkstudy
Workstudy
 
Time And Motion Study
Time And  Motion  StudyTime And  Motion  Study
Time And Motion Study
 

Recently uploaded

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Recently uploaded (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Planning Under Uncertainty With Markov Decision Processes

  • 1. Planning under Uncertainty with Markov Decision Processes: Lecture II Craig Boutilier Department of Computer Science University of Toronto
  • 2.
  • 3.
  • 4. Dimensions of Abstraction (recap) A B C A B C A B C A B C A B C A B C A B C A B C A A B C A B A B C A B C = Uniform Nonuniform Exact Approximate Adaptive Fixed 5.3 5.3 5.3 5.3 2.9 2.9 9.3 9.3 5.3 5.2 5.5 5.3 2.9 2.7 9.3 9.0
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Structured Policy and Value Function DelC BuyC GetU Noop U R W Loc Go Loc HCR HCU 8.36 8.45 7.45 U R W 6.81 7.64 6.64 U R W 5.62 6.19 5.19 U R W 6.10 6.83 5.83 U R W HCR HCU 9.00 W 10.00 Loc Loc
  • 15.
  • 16. A Simple Action/Reward Example X Y Z X Y Z X Y 0.9 0.0 X 1.0 0.0 1.0 Y Z 0.9 0.0 1.0 Z 10 0 Network Rep’n for Action A Reward Function R
  • 17. Example: Generation of V 1 V 0 = R Z 0 10 Y Z Z: 0.9 Z: 0.0 Z: 1.0 Step 1 Y Z 9.0 0.0 10.0 Step 2 Y Z 8.1 0.0 19.0 Step 3: V 1
  • 18. Example: Generation of V 2 Y Z 8.1 0.0 19.0 V 1 Step 1 Step 2 Y X Y Z Y: 0.9 Z: 0.9 Y: 0.9 Z: 0.0 Y:0.9 Z: 1.0 Z Y: 1.0 Y: 0.0 Z: 0.0 Y:0.0 Z: 1.0 X Y Y: 0.9 Y: 0.0 Y: 1.0
  • 20. A Bad Example for SPUDD/SPI Action a k makes X k true; makes X 1 ... X k-1 false; requires X 1 ... X k-1 true Reward: 10 if all X 1 ... X n true (Value function for n = 3 is shown)
  • 22. A Good Example for SPUDD/SPI Action a k makes X k true; requires X 1 ... X k-1 true Reward: 10 if all X 1 ... X n true (Value function for n = 3 is shown)
  • 24.
  • 25.
  • 26. A Pruned Value ADD 8.36 8.45 7.45 U R W 6.81 7.64 6.64 U R W 5.62 6.19 5.19 U R W HCR HCU 9.00 W 10.00 Loc [7.45, 8.45] Loc HCR HCU [9.00, 10.00] [6.64, 7.64] [5.19, 6.19]
  • 27.
  • 28.
  • 29.
  • 30.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. Step 2: Graphical View  t.On(B,t,s) : 10  t.On(B,t,s) : 0  t.On(B,t,s) & Rain(s) & b=B & loc(b,s)=loc(t,s)  t.On(B,t,s) (  b=B v  loc(b,s)=loc(t,s)) &  t.On(B,t,s)  t.On(B,t,s) &  Rain(s) & b=B & loc(b,s)=loc(t,s) 10 7 9 0 1.0 0.7 0.1 0.9 0.3 1.0
  • 39. Step 2: With Logical Simplification
  • 40.
  • 41.
  • 43.
  • 44.
  • 45.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70. Some Results [GKP-01] Computation Time
  • 71. Some Results [GKP-01] Computation Time
  • 72. Some Results [GKP-01] Relative error wrt optimal VF (small problems)
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79. Generating SubMDPs Dynamic Bayes Net over Variable Set
  • 80. Generating SubMDPs Green SubMDP (subset of variables)
  • 81. Generating SubMDPs Red SubMDP (subset of variables)
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.