Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learning Under Uncertainty

Tractable Robust Planning and
Model Learning Under Uncertainty
Jonathan P. How
Aerospace Controls Laboratory
MIT
jhow@mit.edu
March 17th, 2014
J. How (MIT) 3
Autonomous Systems: Opportunity
• New era of information and data availability
– To perform efficient data interpretation and information extraction
 “Big data” and “Data-to-Decisions”
– In many application domains, including transportation, environment,
ocean exploration and healthcare
• Maturing vehicle GNC raises new challenges in mission
design for heterogeneous manned and autonomous assets
• Cost savings and throughput demands driving rapid infusion
of robotic technologies for airframe
manufacturing/maintenance
– DARPA driving paradigm shift in rapid prototyping, manufacturing
• Rapid solutions being developed to policy issues of
autonomous systems integrated into society
– Google car, manufacturing
J. How (MIT) 4
Example: Driving with Uncertainty
• Goal: Improve road safety for urban driving
• Challenge: World complex & dynamic
– Must safely avoid many types of uncertain
static and dynamic obstacles
– Must accurately anticipate other vehicles'
intents and assess danger involved
Reliable Autonomy for
Transportation Systems
– Inference/navigation in dynamic and unstructured
environments — GPS denied navigation
– Provably-correct, real-time planning planning
– Safety & Probabilistic risk assessment
– Learning — Model and policy learning
– Shaping autonomy for use by human operators
Navigating busy intersections
DGC '07: MIT/Cornell accident
4/7/2014
J. How (MIT) 5
Planning Without Learning
4/7/2014
J. Leonard, J. How, S. Teller, M. Berger, S. Campbell, G. Fiore, L. Fletcher, E. Frazzoli, A. Huang,
S. Karaman, et al., A perception-driven autonomous urban vehicle. Springer, 2009.
J. How (MIT) 6
Example: UAV Turing Test
• Challenge: Autonomous operation at uncontrolled airport
– UAV must approach uncontrolled airport, integrate into traffic
pattern and land in a way that is indistinguishable from a
human pilot as observed by other aircraft
• Problem interesting because while general structure of
traffic is known, specifics must be sensed and behavior
of other traffic inferred
J. How (MIT) 7
Challenges
• Goal: Automate mission planning to improve performance
for multiple UAVs in dynamic, uncertain world
– Real-time planning
– Exploration & exploitation — data fusion
– Planning/inference over contested
communication networks
– Human-autonomy interaction
• Challenges:
– Uncertainty: World model is not fully known.
– Dynamic: Objective, world or world model may change
– Stochastic: Same behavior in same situation may result in a
different outcome
– Safety: Arbitrary behaviors can be detrimental to mission/system
4/7/2014
J. How (MIT) 8
Similar Challenges in Many Domains
Civil UAVs
Military UAVs
Space Vehicles
Manufacturing
J. How (MIT) 9
Planning Challenges
• Issue: most planners are model based, which enables
anticipation
• But models are often approximated and/or wrong
– Model parameter uncertainties
– Modeling errors
• Can yield sub-optimal planner output with large
performance mismatch
– Possibly catastrophic mission impact
4/7/2014
J. How (MIT) 10
Planning and Learning
• Two standard approaches
• Baseline Control Algorithms (BCA)
– Fast solutions, but based on simplified models  sub-optimal
– Can provide good foundation to boot-strap learning
• Mitigates catastrophic mistakes
• Online Adaptation/Learning Algorithms
– Handle stochastic system/unknown models
– Computational and sample complexity issues
– Exploration can be dangerous
– Can improve on BCA by adapting to time-varying environment and
mission, and generating new strategies that are most beneficial
• Issue: how develop architecture that realizes this
synergistic combination
4/7/2014
J. How (MIT) 11
Planning and Learning
• Intelligent Cooperative Control
Architecture (iCCA)
– Synergistic integration of planning and
safe learning to improve performance
– Sand-boxing for planning and learning
• Example: 2 UAVs, 6 targets sim (108 state action pairs)
– Cooperative learners perform well with respect to overall reward
and risk levels when compared with baseline planner (CBBA)
and non-cooperative learning algorithms
4/7/2014
1 2 3
.5[2,3]
+100
4
.5
[2,3]
+100
5 [3,4]
+200
5
8
6
+100
.7
7
+300
.6
40%
50%
60%
70%
80%
90%
Optimality
Learner
Planner-Conservative
Planner-Aggressive
iCCA
iCCA+AdaptiveModel
iCCA can improve baseline
planner performance, but
how solve learning
problems in real-time?
Geramifard et al ``Intelligent cooperative control architecture: A framework for performance improvement
using safe learnings,'’ Journal of Intelligent and Robotic Systems, Vol. 72, pp.~83–-103, October 2013.
J. How (MIT) 12
Reinforcement Learning
• Vision: Agents that learn desired behavior from
demonstrations or environment signals.
• Challenge: Continuous/high-dimensional
environments make learning intractable
Algorithm Properties
Bayesian Inverse
Reinforcement
Learning (BNIRL)
Efficient inference of subgoals from
human demonstrations in
continuous domains
Incremental Feature
Dependency
Discovery (iFDD)
Computationally cheap feature
expansion & online learning
Multi-Fidelity
Reinforcement
Learning (MFRL)
Efficient use of simulators to
explore areas where real-world
samples not needed
4/7/2014
B. Michini, M. Cutler, and J. P.
How, “Scalable reward learning
from demonstration,” in IEEE
Interna- tional Conference on
Robotics and Automation
(ICRA), IEEE, 2013.
A. Geramifard, F. Doshi, J. Redding, N. Roy, and J.
How, “Online discovery of feature dependencies,” in
International Conference on Machine Learning
(ICML), pp. 881–888, June 2011.
M. Cutler, T. J. Walsh, and J. P. How,
“Reinforcement learning with multi-fidelity
simulators,” in IEEE International
Conference on Robotics and Automation
(ICRA),, June 2014
J. How (MIT) 13
Learning from Demonstration (LfD)
• LfD intuitive method for teaching autonomous system
• Reward vs policy learning: succinct representation and
transferable, but
– Ill-posed (many potential solutions exist)
– Must assume model of rationality for demonstrator
– Many demonstrations contain multiple tasks
• Current methods (e.g. IRL, Ng ‘00) have limitations
– Parametric rewards; scalability; single reward per demonstration
• Developed Bayesian Nonparametric Inverse RL
– Learn multiple subgoal rewards from single demonstration
– Number of rewards learned, not specified
– Strategies given for scalability (approximations, parallelizable)
4/7/2014
B. Michini, M. Cutler, and J. P.
How, “Scalable reward learning
from demonstration,” in IEEE
Interna- tional Conference on
Robotics and Automation
(ICRA), IEEE, 2013.
J. How (MIT) 14
Experiment: BNIRL for Learning
Quadrotor Flight Maneuvers
Experiments
J. How (MIT) 15
Experimental Results: GPSRL for
Learning RC Car Driving Maneuvers
Introduction Bayesian Nonparametric IRL Gaussian Process SRL Experiments Conclusions
31
J. How (MIT) 16
Experimental Results: GPSRL for
Learning RC Car Driving Maneuvers
• Continuous, unsegmented
demonstration captured
and downsampled
• GPSRL partitions
demonstration and learns
corresponding subgoal
reward functions
J. How (MIT) 17
Scaling Reinforcement Learning
• Vision: Use learning methods to improve
UAV team performance over time
– Typically very high-dimensional state space
– Computationally challenging
• Steps:
– Developed incremental Feature
Dependency Discovery (iFDD) as
novel adaptive function approximator
• Results:
– iFDD has cheap computational
complexity and asymptotic
convergence guarantees
– iFDD outperforms other methods
4/7/2014
A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. How, “Online discovery of feature dependencies,” in
International Conference on Machine Learning (ICML), pp. 881–888, June 2011.
J. How (MIT) 18
RLPy: RL for Education & Research
• Provides growing library of fine-
grained modules for experiments
– (5) Agents, (4) Policies, (10)
Representations, (20) Domains
– Modules can be recombined, frees
researcher from reimplementation
• Reproducible, parallel, platform-
independent experiments
– Rapid prototyping (Python), support for
optimized C code (Cython)
• Tools to automate all parts of
experiment pipeline
– Domain visualization for troubleshooting
– Automatic hyperparameter tuning
4/7/2014
http://acl.mit.edu/RLPy/
J. How (MIT) 19
Multi-Fidelity Reinforcement Learning
• Vision: Leverage simulators to learn optimal behavior with few
real world samples
• Challenges:
– What knowledge should be
shared between agents learning
on different simulators?
– Choosing which simulator to sample -
Low-fidelity simulators are less costly
but less accurate
• Contributions: Developed MFRL
– Lower-fidelity agents send up values
to guide exploration
– High-fidelity agents send down learned
parameters
– Rules for switching levels guarantee
limited number of simulator changes
and efficient exploration
4/7/2014
Lowest Fidelity Highest Fidelity
J. How (MIT) 20
Bayesian Nonparametric Models for Robotics
• Often significant uncertainty about behaviors
and intents of other agents in the environment
– Bayesian nonparametric models (BNPs) uniquely
provide flexibility to learn model size & parameters
– Important because it is often very difficult
to pre-specify model size
• Example: Gaussian Process (GP) BNP
for continuous functions
– Can learn number of motion models and
their velocity fields using Dirichlet process
GP mixture (DP-GP)
– Can also capture temporally evolving
behaviors using DDP-GP
• Application: threat assessment
– Model, classify & assess intent/behavior
of other drivers and pedestrians
– Embed in robust planner (CC-RRT*)
– Driver aid and/or autonomous car
4/7/2014
T. Campbell, S. S. Ponda, G. Chowdhary, and J. P. How, “Planning
under uncertainty using nonparametric Bayesian models,” in AIAA
Guidance, Navigation, and Control Conference (GNC), August 2012.
G. S. Aoude, B. D. Luders, J. M. Joseph,
N. Roy, and J. P. How, “Probabilistically
safe motion planning to avoid dynamic
obstacles with uncertain motion patterns,”
Autonomous Robots, vol. 35, no. 1, pp.
51–76, 2013.
D. Lin, E. Grimson, and J. Fisher, “Construction of dependent
dirichlet processes based on poisson processes,” in Neural
Information Processing Systems, 2010.
J. How (MIT) 21
Fast BNP Learning
• Vision: Flexible learning for temporally evolving data
without sacrificing speed (real-time robotic systems)
• Challenges:
– Flexible models are computationally demanding
(e.g., Gibbs sampling for DP-GP, DDP-GP)
– Computationally cheap models are rigid
• Results: Dynamic Means
– Derived from low-variance asymptotic analysis of DDP mixture
– Cluster birth, death, and transitions
– Guaranteed monotonic convergence in clustering cost
4/7/2014
% Label
Accuracy
log10
CPU Time
T. Campbell, M. Liu, B. Kulis, J. P. How, and L.
Carin, “Dynamic clustering via asymptotics of
the dependent dirichlet process,” in Advances in
Neural Information Processing Systems (NIPS),
2013.
J. How (MIT) 22
Experimental Implementation
• Sgun movie
4/7/2014
J. How (MIT) 23
Example: Driving with Uncertainty
• Goal: Improve road safety for urban driving
• Challenge: World complex & dynamic
– Must safely avoid many types of uncertain
static and dynamic obstacles
– Must accurately anticipate other vehicles'
intents and assess danger involved
• Objective: Develop probabilistic models
of environment (cars, pedestrians,
cyclists,...), and robust path planner
which utilizes models to safely navigate
urban environments
– Distributions over possible intents, and
trajectories for each intent
– Efficient enough for real-time use
Navigating busy intersections
DGC '07: MIT/Cornell accident
4/7/2014
J. How (MIT) 24
Approach
• Simultaneous trajectory prediction and robust avoidance}
of multiple obstacle classes (static and dynamic)
• DP-GP: automatically classifies trajectories into behavior
patterns; uses GP mixture model to compute
– Probability of being in each motion pattern
given observed trajectory
– Position distribution within each pattern at future timesteps
 probabilistic models for propagated (intent, path) uncertainty
• RR-GP: refines predictions based on dynamics,
environment
• CC-RRT*: optimized, robust motion planning
4/7/2014
B. D. Luders, S. Karaman, and J. P. How, “Robust
sampling-based motion planning with asymptotic
optimality guarantees,” in AIAA Guidance, Navigation,
and Control Conference (GNC), (August 2013.
G. S. Aoude, B. D. Luders, J. M. Joseph,
N. Roy, and J. P. How, “Probabilistically
safe motion planning to avoid dynamic
obstacles with uncertain motion patterns,”
Autonomous Robots, vol. 35, no. 1, pp.
51–76, 2013.
J. How (MIT) 25
CC-RRT* for Robust Motion Planning
• Real-time optimizing algorithm with guaranteed
probabilistic robustness to internal/external uncertainty
– Leverages RRT: anytime algorithm; quickly explores large state
spaces; dynamically feasibility; trajectory-wise constraint checking
• CC-RRT: efficient online risk evaluation
– Well-suited to real-time planning/updates with DPGP motion
models
• RRT*: asymptotic optimality
• CC-RRT* is a very
scalable algorithm
4/7/2014
S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion
planning,” International Journal of Robotics Research, vol. 30, pp. 846–894,
June 2011.
J. How (MIT) 26
Robust Planning Examples
4/7/2014
J. How (MIT) 27
1
2
3
4 5 6 7 8 9
10
11
12
1314
Friday, May 10, 13
Reliable Autonomy for Transportation
• Vision: Safe reliable autonomy crucial component of
future acceptance and deployment of autonomous
systems
• Objective: Develop reliable autonomous
systems that can operate safely and
effectively for long durations in complex
and dynamic environments
– Control theory, verification and validation,
autonomous systems, and software safety
• Currently developing Mobility on
Demand system on campus
– Builds on SMART (Frazzoli)
4/7/2014
J. How (MIT) 28
Multiagent Planning With Learning
• Mission: Visually detect target vehicles, then persistently
perform track/surveillance using UGV and UAVs
– On-line planning and learning
– Sensor failure transition model
learned using iFDD
– Policy is re-computed
online using Dec-MMDP
• Cumulative cost reduces
during mission
– Improved performance
due to learning
• Number swaps per time
period reduces
– Team learns that initial probability
of sensor failure too pessimistic
4/7/2014
0 0.5 1 1.5 2 2.5 3 3.5
180
190
200
210
220
230
240
250
260
Time (hours)
IntermediateCumulativeCost
0 0.5 1 1.5 2 2.5 3 3.5
0
5
10
15
20
25
Time (hours)
Numberofswapsper30minutes
N. K. Ure, G. Chowdhary, Y. F. Chen, J. P. How, and J. Vian,
“Distributed learning for planning under uncertainty problems
with heterogeneous teams,” Journal of Intelligent and Robotic
Systems, pp. 1–16, 2013.
J. How (MIT) 29
Conclusions
• New era of information and data availability
– Many new opportunities in guidance/control & robotics
• Learning and adaptation are keys to reliable autonomy
– Overcome the sample and computational complexity
– More realistic applications
• Discussed Model Learning, but similar strategies for
Policy Learning
• Very exciting times: Autonomous cars and UAS in NAS
in our lifetime??
• Many references available at http://acl.mit.edu
4/7/2014
1 von 28

Recomendados

Autonomy Incubator Seminar Series: Neuromorphic solutions for autonomous land... von
Autonomy Incubator Seminar Series: Neuromorphic solutions for autonomous land...Autonomy Incubator Seminar Series: Neuromorphic solutions for autonomous land...
Autonomy Incubator Seminar Series: Neuromorphic solutions for autonomous land...AutonomyIncubator
1.4K views61 Folien
Deep reinforcement learning framework for autonomous driving von
Deep reinforcement learning framework for autonomous drivingDeep reinforcement learning framework for autonomous driving
Deep reinforcement learning framework for autonomous drivingGopikaGopinath5
91 views11 Folien
Simulation of traffic engg. von
Simulation of traffic engg.Simulation of traffic engg.
Simulation of traffic engg.vijay reddy
629 views14 Folien
Fahroo - Dynamics and Control - Spring Review 2013 von
Fahroo - Dynamics and Control - Spring Review 2013Fahroo - Dynamics and Control - Spring Review 2013
Fahroo - Dynamics and Control - Spring Review 2013The Air Force Office of Scientific Research
807 views22 Folien
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut... von
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Pooyan Jamshidi
147 views43 Folien
Analysis of Educational Robotics activities using a machine learning approach von
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachLorenzo Cesaretti
161 views18 Folien

Más contenido relacionado

Similar a Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learning Under Uncertainty

achine Learning and Model Risk von
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model RiskQuantUniversity
676 views27 Folien
Local modeling in regression and time series prediction von
Local modeling in regression and time series predictionLocal modeling in regression and time series prediction
Local modeling in regression and time series predictionGianluca Bontempi
1.1K views90 Folien
Ibm colloquium 070915_nyberg von
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nybergdiannepatricia
1K views24 Folien
Computer modelling and simulations von
Computer modelling and simulationsComputer modelling and simulations
Computer modelling and simulationstangytangling
14.3K views42 Folien
How to Build a Data Closed-loop Platform for Autonomous Driving? von
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
454 views63 Folien
Validation Framework for Autonomous Aerial Vehicles von
Validation Framework for Autonomous Aerial VehiclesValidation Framework for Autonomous Aerial Vehicles
Validation Framework for Autonomous Aerial VehiclesM. Ilhan Akbas
90 views37 Folien

Similar a Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learning Under Uncertainty(20)

Local modeling in regression and time series prediction von Gianluca Bontempi
Local modeling in regression and time series predictionLocal modeling in regression and time series prediction
Local modeling in regression and time series prediction
Gianluca Bontempi1.1K views
Computer modelling and simulations von tangytangling
Computer modelling and simulationsComputer modelling and simulations
Computer modelling and simulations
tangytangling14.3K views
How to Build a Data Closed-loop Platform for Autonomous Driving? von Yu Huang
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
Yu Huang454 views
Validation Framework for Autonomous Aerial Vehicles von M. Ilhan Akbas
Validation Framework for Autonomous Aerial VehiclesValidation Framework for Autonomous Aerial Vehicles
Validation Framework for Autonomous Aerial Vehicles
M. Ilhan Akbas90 views
Testing Machine Learning-enabled Systems: A Personal Perspective von Lionel Briand
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
Lionel Briand1.2K views
Traffic simulation von Praful -
Traffic simulationTraffic simulation
Traffic simulation
Praful -4.2K views
How to Make Cars Smarter: A Step Towards Self-Driving Cars von VMware Tanzu
How to Make Cars Smarter: A Step Towards Self-Driving CarsHow to Make Cars Smarter: A Step Towards Self-Driving Cars
How to Make Cars Smarter: A Step Towards Self-Driving Cars
VMware Tanzu864 views
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct... von Deltares
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
Deltares807 views
Certified Machine Learning Specialist (CMLS) von GICTTraining
Certified Machine Learning Specialist (CMLS)Certified Machine Learning Specialist (CMLS)
Certified Machine Learning Specialist (CMLS)
GICTTraining54 views
Automation and robotics ii von HCS
Automation and robotics iiAutomation and robotics ii
Automation and robotics ii
HCS338 views
Air Quality Modelling Tools (Aberdeen Pilot Project) Dr. Alan Hills, SEPA von STEP_scotland
Air Quality Modelling Tools (Aberdeen Pilot Project) Dr. Alan Hills, SEPAAir Quality Modelling Tools (Aberdeen Pilot Project) Dr. Alan Hills, SEPA
Air Quality Modelling Tools (Aberdeen Pilot Project) Dr. Alan Hills, SEPA
STEP_scotland1.4K views
A Survey of Machine Learning Methods Applied to Computer ... von butest
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...
butest2.9K views
Scenario Generation for Validating Artifi cial Intelligence Based Autonomous ... von M. Ilhan Akbas
Scenario Generation for Validating Artificial Intelligence Based Autonomous ...Scenario Generation for Validating Artificial Intelligence Based Autonomous ...
Scenario Generation for Validating Artifi cial Intelligence Based Autonomous ...
M. Ilhan Akbas69 views

Último

The Research Portal of Catalonia: Growing more (information) & more (services) von
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)CSUC - Consorci de Serveis Universitaris de Catalunya
80 views25 Folien
NET Conf 2023 Recap von
NET Conf 2023 RecapNET Conf 2023 Recap
NET Conf 2023 RecapLee Richardson
10 views71 Folien
Kyo - Functional Scala 2023.pdf von
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
400 views92 Folien
Data Integrity for Banking and Financial Services von
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial ServicesPrecisely
25 views26 Folien
PharoJS - Zürich Smalltalk Group Meetup November 2023 von
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023Noury Bouraqadi
132 views17 Folien
Mini-Track: Challenges to Network Automation Adoption von
Mini-Track: Challenges to Network Automation AdoptionMini-Track: Challenges to Network Automation Adoption
Mini-Track: Challenges to Network Automation AdoptionNetwork Automation Forum
13 views27 Folien

Último(20)

Data Integrity for Banking and Financial Services von Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely25 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 von Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi132 views
"Running students' code in isolation. The hard way", Yurii Holiuk von Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays17 views
Future of AR - Facebook Presentation von ssuserb54b561
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
ssuserb54b56115 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 von IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
HTTP headers that make your website go faster - devs.gent November 2023 von Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... von TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc11 views
Serverless computing with Google Cloud (2023-24) von wesley chun
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)
wesley chun11 views
6g - REPORT.pdf von Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf von Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Case Study Copenhagen Energy and Business Central.pdf von Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... von Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker40 views

Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learning Under Uncertainty

  • 1. Tractable Robust Planning and Model Learning Under Uncertainty Jonathan P. How Aerospace Controls Laboratory MIT jhow@mit.edu March 17th, 2014
  • 2. J. How (MIT) 3 Autonomous Systems: Opportunity • New era of information and data availability – To perform efficient data interpretation and information extraction  “Big data” and “Data-to-Decisions” – In many application domains, including transportation, environment, ocean exploration and healthcare • Maturing vehicle GNC raises new challenges in mission design for heterogeneous manned and autonomous assets • Cost savings and throughput demands driving rapid infusion of robotic technologies for airframe manufacturing/maintenance – DARPA driving paradigm shift in rapid prototyping, manufacturing • Rapid solutions being developed to policy issues of autonomous systems integrated into society – Google car, manufacturing
  • 3. J. How (MIT) 4 Example: Driving with Uncertainty • Goal: Improve road safety for urban driving • Challenge: World complex & dynamic – Must safely avoid many types of uncertain static and dynamic obstacles – Must accurately anticipate other vehicles' intents and assess danger involved Reliable Autonomy for Transportation Systems – Inference/navigation in dynamic and unstructured environments — GPS denied navigation – Provably-correct, real-time planning planning – Safety & Probabilistic risk assessment – Learning — Model and policy learning – Shaping autonomy for use by human operators Navigating busy intersections DGC '07: MIT/Cornell accident 4/7/2014
  • 4. J. How (MIT) 5 Planning Without Learning 4/7/2014 J. Leonard, J. How, S. Teller, M. Berger, S. Campbell, G. Fiore, L. Fletcher, E. Frazzoli, A. Huang, S. Karaman, et al., A perception-driven autonomous urban vehicle. Springer, 2009.
  • 5. J. How (MIT) 6 Example: UAV Turing Test • Challenge: Autonomous operation at uncontrolled airport – UAV must approach uncontrolled airport, integrate into traffic pattern and land in a way that is indistinguishable from a human pilot as observed by other aircraft • Problem interesting because while general structure of traffic is known, specifics must be sensed and behavior of other traffic inferred
  • 6. J. How (MIT) 7 Challenges • Goal: Automate mission planning to improve performance for multiple UAVs in dynamic, uncertain world – Real-time planning – Exploration & exploitation — data fusion – Planning/inference over contested communication networks – Human-autonomy interaction • Challenges: – Uncertainty: World model is not fully known. – Dynamic: Objective, world or world model may change – Stochastic: Same behavior in same situation may result in a different outcome – Safety: Arbitrary behaviors can be detrimental to mission/system 4/7/2014
  • 7. J. How (MIT) 8 Similar Challenges in Many Domains Civil UAVs Military UAVs Space Vehicles Manufacturing
  • 8. J. How (MIT) 9 Planning Challenges • Issue: most planners are model based, which enables anticipation • But models are often approximated and/or wrong – Model parameter uncertainties – Modeling errors • Can yield sub-optimal planner output with large performance mismatch – Possibly catastrophic mission impact 4/7/2014
  • 9. J. How (MIT) 10 Planning and Learning • Two standard approaches • Baseline Control Algorithms (BCA) – Fast solutions, but based on simplified models  sub-optimal – Can provide good foundation to boot-strap learning • Mitigates catastrophic mistakes • Online Adaptation/Learning Algorithms – Handle stochastic system/unknown models – Computational and sample complexity issues – Exploration can be dangerous – Can improve on BCA by adapting to time-varying environment and mission, and generating new strategies that are most beneficial • Issue: how develop architecture that realizes this synergistic combination 4/7/2014
  • 10. J. How (MIT) 11 Planning and Learning • Intelligent Cooperative Control Architecture (iCCA) – Synergistic integration of planning and safe learning to improve performance – Sand-boxing for planning and learning • Example: 2 UAVs, 6 targets sim (108 state action pairs) – Cooperative learners perform well with respect to overall reward and risk levels when compared with baseline planner (CBBA) and non-cooperative learning algorithms 4/7/2014 1 2 3 .5[2,3] +100 4 .5 [2,3] +100 5 [3,4] +200 5 8 6 +100 .7 7 +300 .6 40% 50% 60% 70% 80% 90% Optimality Learner Planner-Conservative Planner-Aggressive iCCA iCCA+AdaptiveModel iCCA can improve baseline planner performance, but how solve learning problems in real-time? Geramifard et al ``Intelligent cooperative control architecture: A framework for performance improvement using safe learnings,'’ Journal of Intelligent and Robotic Systems, Vol. 72, pp.~83–-103, October 2013.
  • 11. J. How (MIT) 12 Reinforcement Learning • Vision: Agents that learn desired behavior from demonstrations or environment signals. • Challenge: Continuous/high-dimensional environments make learning intractable Algorithm Properties Bayesian Inverse Reinforcement Learning (BNIRL) Efficient inference of subgoals from human demonstrations in continuous domains Incremental Feature Dependency Discovery (iFDD) Computationally cheap feature expansion & online learning Multi-Fidelity Reinforcement Learning (MFRL) Efficient use of simulators to explore areas where real-world samples not needed 4/7/2014 B. Michini, M. Cutler, and J. P. How, “Scalable reward learning from demonstration,” in IEEE Interna- tional Conference on Robotics and Automation (ICRA), IEEE, 2013. A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. How, “Online discovery of feature dependencies,” in International Conference on Machine Learning (ICML), pp. 881–888, June 2011. M. Cutler, T. J. Walsh, and J. P. How, “Reinforcement learning with multi-fidelity simulators,” in IEEE International Conference on Robotics and Automation (ICRA),, June 2014
  • 12. J. How (MIT) 13 Learning from Demonstration (LfD) • LfD intuitive method for teaching autonomous system • Reward vs policy learning: succinct representation and transferable, but – Ill-posed (many potential solutions exist) – Must assume model of rationality for demonstrator – Many demonstrations contain multiple tasks • Current methods (e.g. IRL, Ng ‘00) have limitations – Parametric rewards; scalability; single reward per demonstration • Developed Bayesian Nonparametric Inverse RL – Learn multiple subgoal rewards from single demonstration – Number of rewards learned, not specified – Strategies given for scalability (approximations, parallelizable) 4/7/2014 B. Michini, M. Cutler, and J. P. How, “Scalable reward learning from demonstration,” in IEEE Interna- tional Conference on Robotics and Automation (ICRA), IEEE, 2013.
  • 13. J. How (MIT) 14 Experiment: BNIRL for Learning Quadrotor Flight Maneuvers Experiments
  • 14. J. How (MIT) 15 Experimental Results: GPSRL for Learning RC Car Driving Maneuvers Introduction Bayesian Nonparametric IRL Gaussian Process SRL Experiments Conclusions 31
  • 15. J. How (MIT) 16 Experimental Results: GPSRL for Learning RC Car Driving Maneuvers • Continuous, unsegmented demonstration captured and downsampled • GPSRL partitions demonstration and learns corresponding subgoal reward functions
  • 16. J. How (MIT) 17 Scaling Reinforcement Learning • Vision: Use learning methods to improve UAV team performance over time – Typically very high-dimensional state space – Computationally challenging • Steps: – Developed incremental Feature Dependency Discovery (iFDD) as novel adaptive function approximator • Results: – iFDD has cheap computational complexity and asymptotic convergence guarantees – iFDD outperforms other methods 4/7/2014 A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. How, “Online discovery of feature dependencies,” in International Conference on Machine Learning (ICML), pp. 881–888, June 2011.
  • 17. J. How (MIT) 18 RLPy: RL for Education & Research • Provides growing library of fine- grained modules for experiments – (5) Agents, (4) Policies, (10) Representations, (20) Domains – Modules can be recombined, frees researcher from reimplementation • Reproducible, parallel, platform- independent experiments – Rapid prototyping (Python), support for optimized C code (Cython) • Tools to automate all parts of experiment pipeline – Domain visualization for troubleshooting – Automatic hyperparameter tuning 4/7/2014 http://acl.mit.edu/RLPy/
  • 18. J. How (MIT) 19 Multi-Fidelity Reinforcement Learning • Vision: Leverage simulators to learn optimal behavior with few real world samples • Challenges: – What knowledge should be shared between agents learning on different simulators? – Choosing which simulator to sample - Low-fidelity simulators are less costly but less accurate • Contributions: Developed MFRL – Lower-fidelity agents send up values to guide exploration – High-fidelity agents send down learned parameters – Rules for switching levels guarantee limited number of simulator changes and efficient exploration 4/7/2014 Lowest Fidelity Highest Fidelity
  • 19. J. How (MIT) 20 Bayesian Nonparametric Models for Robotics • Often significant uncertainty about behaviors and intents of other agents in the environment – Bayesian nonparametric models (BNPs) uniquely provide flexibility to learn model size & parameters – Important because it is often very difficult to pre-specify model size • Example: Gaussian Process (GP) BNP for continuous functions – Can learn number of motion models and their velocity fields using Dirichlet process GP mixture (DP-GP) – Can also capture temporally evolving behaviors using DDP-GP • Application: threat assessment – Model, classify & assess intent/behavior of other drivers and pedestrians – Embed in robust planner (CC-RRT*) – Driver aid and/or autonomous car 4/7/2014 T. Campbell, S. S. Ponda, G. Chowdhary, and J. P. How, “Planning under uncertainty using nonparametric Bayesian models,” in AIAA Guidance, Navigation, and Control Conference (GNC), August 2012. G. S. Aoude, B. D. Luders, J. M. Joseph, N. Roy, and J. P. How, “Probabilistically safe motion planning to avoid dynamic obstacles with uncertain motion patterns,” Autonomous Robots, vol. 35, no. 1, pp. 51–76, 2013. D. Lin, E. Grimson, and J. Fisher, “Construction of dependent dirichlet processes based on poisson processes,” in Neural Information Processing Systems, 2010.
  • 20. J. How (MIT) 21 Fast BNP Learning • Vision: Flexible learning for temporally evolving data without sacrificing speed (real-time robotic systems) • Challenges: – Flexible models are computationally demanding (e.g., Gibbs sampling for DP-GP, DDP-GP) – Computationally cheap models are rigid • Results: Dynamic Means – Derived from low-variance asymptotic analysis of DDP mixture – Cluster birth, death, and transitions – Guaranteed monotonic convergence in clustering cost 4/7/2014 % Label Accuracy log10 CPU Time T. Campbell, M. Liu, B. Kulis, J. P. How, and L. Carin, “Dynamic clustering via asymptotics of the dependent dirichlet process,” in Advances in Neural Information Processing Systems (NIPS), 2013.
  • 21. J. How (MIT) 22 Experimental Implementation • Sgun movie 4/7/2014
  • 22. J. How (MIT) 23 Example: Driving with Uncertainty • Goal: Improve road safety for urban driving • Challenge: World complex & dynamic – Must safely avoid many types of uncertain static and dynamic obstacles – Must accurately anticipate other vehicles' intents and assess danger involved • Objective: Develop probabilistic models of environment (cars, pedestrians, cyclists,...), and robust path planner which utilizes models to safely navigate urban environments – Distributions over possible intents, and trajectories for each intent – Efficient enough for real-time use Navigating busy intersections DGC '07: MIT/Cornell accident 4/7/2014
  • 23. J. How (MIT) 24 Approach • Simultaneous trajectory prediction and robust avoidance} of multiple obstacle classes (static and dynamic) • DP-GP: automatically classifies trajectories into behavior patterns; uses GP mixture model to compute – Probability of being in each motion pattern given observed trajectory – Position distribution within each pattern at future timesteps  probabilistic models for propagated (intent, path) uncertainty • RR-GP: refines predictions based on dynamics, environment • CC-RRT*: optimized, robust motion planning 4/7/2014 B. D. Luders, S. Karaman, and J. P. How, “Robust sampling-based motion planning with asymptotic optimality guarantees,” in AIAA Guidance, Navigation, and Control Conference (GNC), (August 2013. G. S. Aoude, B. D. Luders, J. M. Joseph, N. Roy, and J. P. How, “Probabilistically safe motion planning to avoid dynamic obstacles with uncertain motion patterns,” Autonomous Robots, vol. 35, no. 1, pp. 51–76, 2013.
  • 24. J. How (MIT) 25 CC-RRT* for Robust Motion Planning • Real-time optimizing algorithm with guaranteed probabilistic robustness to internal/external uncertainty – Leverages RRT: anytime algorithm; quickly explores large state spaces; dynamically feasibility; trajectory-wise constraint checking • CC-RRT: efficient online risk evaluation – Well-suited to real-time planning/updates with DPGP motion models • RRT*: asymptotic optimality • CC-RRT* is a very scalable algorithm 4/7/2014 S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion planning,” International Journal of Robotics Research, vol. 30, pp. 846–894, June 2011.
  • 25. J. How (MIT) 26 Robust Planning Examples 4/7/2014
  • 26. J. How (MIT) 27 1 2 3 4 5 6 7 8 9 10 11 12 1314 Friday, May 10, 13 Reliable Autonomy for Transportation • Vision: Safe reliable autonomy crucial component of future acceptance and deployment of autonomous systems • Objective: Develop reliable autonomous systems that can operate safely and effectively for long durations in complex and dynamic environments – Control theory, verification and validation, autonomous systems, and software safety • Currently developing Mobility on Demand system on campus – Builds on SMART (Frazzoli) 4/7/2014
  • 27. J. How (MIT) 28 Multiagent Planning With Learning • Mission: Visually detect target vehicles, then persistently perform track/surveillance using UGV and UAVs – On-line planning and learning – Sensor failure transition model learned using iFDD – Policy is re-computed online using Dec-MMDP • Cumulative cost reduces during mission – Improved performance due to learning • Number swaps per time period reduces – Team learns that initial probability of sensor failure too pessimistic 4/7/2014 0 0.5 1 1.5 2 2.5 3 3.5 180 190 200 210 220 230 240 250 260 Time (hours) IntermediateCumulativeCost 0 0.5 1 1.5 2 2.5 3 3.5 0 5 10 15 20 25 Time (hours) Numberofswapsper30minutes N. K. Ure, G. Chowdhary, Y. F. Chen, J. P. How, and J. Vian, “Distributed learning for planning under uncertainty problems with heterogeneous teams,” Journal of Intelligent and Robotic Systems, pp. 1–16, 2013.
  • 28. J. How (MIT) 29 Conclusions • New era of information and data availability – Many new opportunities in guidance/control & robotics • Learning and adaptation are keys to reliable autonomy – Overcome the sample and computational complexity – More realistic applications • Discussed Model Learning, but similar strategies for Policy Learning • Very exciting times: Autonomous cars and UAS in NAS in our lifetime?? • Many references available at http://acl.mit.edu 4/7/2014