SlideShare ist ein Scribd-Unternehmen logo
1 von 19
© SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger
Explaining Online Reinforcement Learning Decisions
of Self-Adaptive Systems
Felix Feit, Andreas Metzger, Klaus Pohl
ACSOS 2022
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 2
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Online Reinforcement Learning for SAS
A. Metzger, ACSOS 2022
3
MAPE-K Combination
RL
Self-Adaptation Logic
Analyze
Monitor Execute
Plan
Knowledge
Online RL for SAS
Execute
Policy
(K)
Monitor
Action
Selection
(A + P)
Policy
Update
Action
A
State S
Reward R
Next state S’
Action A
State S
Reward R
Action
Selection
Next state S’
RL Agent
Policy
Policy
Update
Environ-
ment
• Online RL = Emerging approach for addressing design time uncertainty [Palm et al. @ CAiSE 2020]
 Leverage information only available @ runtime (i.e., during live system execution)
• Since 2019 use of Learning for SAS most prominent strategy [Porter et al. @ ACSOS 2020]
• Conceptual Model of Online RL:
[Metzger et al. @ Computing 2022]
Learning Goal
defined by
Reward
Function
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Online Reinforcement Learning for SAS
Policy (Knowledge) represented
as deep neural network
Pro
• Handling of continuous
states and actions
• Generalization over unseen,
neighboring states
Con
• Deep RL = “black box”
•  Limited trustworthiness
•  Difficult to debug
e.g., reward function
correctly defined?
A. Metzger, ACSOS 2022 4
Increasing use of Deep RL The power of deep learning:
“/imagine yellow Labrador in the style of…”
[generated using Midjourney AI]
UP RIGHT DOWN LEFT
-12,224348 -12,044198 -12,463232 -12,349292
-11,368766 -11,407281 -11,567699 -11,741966
-10,724603 -10,758294 -10,878073 -10,956671
-10,118299 -9,9104485 -10,209248 -9,9700161
-9,2663503 -9,0282991 -9,4003475 -9,4925009
-8,1970854 -8,145322 -8,3578677 -8,6139991
-7,5583819 -7,3056764 -7,4218645 -7,6021728
-6,5359939 -6,4629871 -6,6290838 -6,800472
-5,8610507 -5,6457718 -5,6477581 -6,1414037
-5,01 -4,8068304 -4,7890915 -4,8609271
-3,9 -3,902123 -3,9078592 -4,0513057
-3,1389 -3,273 -2,9849214 -3,3706948
-12,609306 -12,612216 -12,579926 -12,784373
-11,845763 -11,794108 -11,845683 -12,478945
-11,237922 -10,873018 -10,900784 -11,436694
-10,008564 -9,9425947 -9,9517958 -10,268937
-9,2581936 -8,9864275 -8,9889605 -9,1798613
-8,1866584 -7,9931344 -7,9940185 -8,9510004
-7,0729858 -6,9970914 -6,9977784 -8,2920288
-6,0319487 -5,9988816 -5,9989739 -6,6224484
-6,0996963 -4,9994285 -4,999519 -6,232246
-4,8057376 -3,9997425 -3,9997874 -4,9212681
-3,1293615 -2,9999285 -2,9999384 -3,556979
-2,7588749 -2,1 -2 -2,2345458
-13,360876 -12 -13,954412 -12,991696
-12,565624 -11 -112,1835 -12,995431
-11,733772 -10 -112,79659 -11,980454
-10,740429 -9 -111,48554 -10,947642
-9,95339 -8 -112,12695 -9,9878453
-8,9112 -7 -112,67844 -8,8890419
-7,992178 -6 -112,91331 -7,965498
-6,9846114 -5 -112,41604 -6,9763523
-5,9533325 -4 -111,81117 -5,9401313
-4,9217978 -3 -110,6219 -4,8068301
-3,9307738 -2 -112,85584 -3,9418704
-2,9796888 -1,9340884 -1 -2,9827181
-13 -112,90933 -13,998779 -13,995334
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
UP RIGHT DOWN LEFT
-12,224348 -12,044198 -12,463232 -12,349292
-11,368766 -11,407281 -11,567699 -11,741966
-10,724603 -10,758294 -10,878073 -10,956671
-10,118299 -9,9104485 -10,209248 -9,9700161
-9,2663503 -9,0282991 -9,4003475 -9,4925009
-8,1970854 -8,145322 -8,3578677 -8,6139991
-7,5583819 -7,3056764 -7,4218645 -7,6021728
-6,5359939 -6,4629871 -6,6290838 -6,800472
-5,8610507 -5,6457718 -5,6477581 -6,1414037
-5,01 -4,8068304 -4,7890915 -4,8609271
-3,9 -3,902123 -3,9078592 -4,0513057
-3,1389 -3,273 -2,9849214 -3,3706948
-12,609306 -12,612216 -12,579926 -12,784373
-11,845763 -11,794108 -11,845683 -12,478945
-11,237922 -10,873018 -10,900784 -11,436694
-10,008564 -9,9425947 -9,9517958 -10,268937
-9,2581936 -8,9864275 -8,9889605 -9,1798613
State
S
Action A
State
S
Action
A
Deep RL Classical RL (Q-Learning)
Environ-
ment
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Explainable Reinforcement Learning (XRL) for SAS
A. Metzger, ACSOS 2022 5
State of the Art
Goal-based Models [Welsh et al. @ Trans. CCI 2014]
• Explanations in terms of the satisficement of softgoals
• Requires making assumptions about environment dynamics at
design time (difficult due to design time uncertainty)
Provenance Graphs [Reynolds et al. @ MODELS Wkshp 2020]
• Graph history to determine if, how and why model changed
• Graph can become too complex to be meaningfully interpreted by humans
• Query language suggested for “pruning” graphs but not for explanations
Temporal Graph Models [Ullauri et al. @ SoSym 2022]
• Explicitly considers Deep RL
• Explanations via queries to model @ runtime
• Interesting points of interactions extracted via CEP
• No detailed, contrastive decomposition of explanations
Example: Vacuum Cleaner
Example:
Fibonacci
Example: Remote Data Mirroring
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 6
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Reward Decomposition
[Sequeira et al. @ Artif. Intell. 2020]
Decompose reward function to explain short-
term goal orientation of RL (train sub-RL agents)
Pro
• Helpful in the presence of multiple, “competing”
quality goals for learning
• Provides contrastive (counterfactual) explanations
Con
• No indication of explanation’s relevance
• Requires manually selecting relevant explanations
 cognitive overhead
Interestingness Elements
[Juozapaitis et al. @ IJCAI Wkshp 2019]
Identify relevant moments of interaction
between agent and environment at runtime
Pro
• Facilitates automatically selecting relevant
interactions to be explained
Con
• Does not explain whether RL behaves as expected
and for the right reasons
7
Augment and Combine RL Explanation Techniques from AI Research
Decomposed Interestingness Elements (DINEs)
A. Metzger, ACSOS 2022
+
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Internal Behaviour
Evolution of Reward
External Behaviour
Evolution of States and Actions
8
Understanding RL without DINEs?
A. Metzger, ACSOS 2022
R
S, A
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Important Interaction
Determine whether RL in given state is
uncertain (wide range of actions) or
certain (almost always same action)
• How much does relative importance
of actions differ for each sub-agent?
• Number of DINES shown can be
tuned via Threshold ρ (level of
inequality)
9
Three Types of DINEs
A. Metzger, ACSOS 2022
Visualization in Dashboard
Certain Uncertain
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Reward Channel Dominance
Influence that each sub-agent has on
each possible action
• Influence of rewards of sub-agents
on composed decision
10
Three Types of DINEs
A. Metzger, ACSOS 2022
Visualization in Dashboard
Relative
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Reward Channel Extremum
Points after local minimum/maximum
of state-value  RL decisions in
potentially critical states
• ExpectedReward (S) –
ExpectedReward (S’) > ϕ
 Maximum
• Number of DINES shown can be
tuned via Threshold ϕ
11
Three Types of DINEs
A. Metzger, ACSOS 2022
Visualization in Dashboard
Minimum
Maximum
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 12
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Validation
Proof-of-Concept
Implementation
• Double Deep Q-Networks with
Experience Replay [Hasselt et al.
@ AAAI 2016]
• Approximation of environment
model using supervised
learning on contents of replay
memory
• OpenAI Gym interface to
connect RL and SAS
A. Metzger, ACSOS 2022 13
Experimental Setup
RL Problem Formulation
• Action Space =
{Add / remove web servers,
Change dimmer value}
• State Space =
{Request arrival rate,
Average throughput,
Average response time}
• Decomposed Reward Function
Self-Adaptive System
• “SWIM” Exemplar [Moreno et al.
@ SEAMS 2018]
• Self-adaptive multi-tier web
application
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Validation
A. Metzger, ACSOS 2022 14
Qualitative Results
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Validation
Important Interactions Reward Channel Extrema
A. Metzger, ACSOS 2022 15
Quantitative Results
Cognitive load ~ number of DINEs shown to developers
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 16
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Discussion
A. Metzger, ACSOS 2022 17
Limitations of XRL-DINE
May generate difficult to understand explanations
• Reason 1: Reward function was decomposed incorrectly or non-optimally
• Reason 2: Environment dynamics may delay effects of adaptations and thus understandability
Not directly applicable to collaborative adaptive systems
• XRL-DINE does not consider decisions of other RL agents
• May lead to misleading explanations if DINE-XRL is directly applied in collaborative setting
Only works for value-based deep RL
• XRL-DINEs computed using value-function Q(S, A) – details see paper
• Value-based deep RL: policy = Q(S, A) approximated by neural network
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Outlook
A. Metzger, ACSOS 2022 18
Considering Explanation Requirements from Social Sciences [Miller @ Artif. Intell. 2019]
Contrastive: “why P happened instead of Q?”
 “Reward Channel Dominance” DINE
Selective: “no need for complete course of events”
 “Reward Channel Extrema” DINE &
“Important Interactions” DINE
Causal: “most likely explanation not
necessarily the best”
 Future work: e.g., check whether agent relies on
spurious correlations (and not causality)
[Gajcin et al. @ AAAMAS Wkshp 2022]
Social: “transfer of knowledge as part of a
conversation”
 Future work: e.g., Chatbot4XAI


?
? Prediction
Explanation
Train will be
delayed
Train passed
last light 5
min later
than typical
This also
happened
yesterday, but
why today a
problem?
Because of
attribute
“number of
trains behind
current one” > 5
What would
have to change
such that not
delay
(counterfactual)
“number of
trains behind
current one” < 2
Human
(Explainee)
Chatbot4XAI
(Explainer)
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Thank You!
Research leading to these results has received funding from the EU’s Horizon 2020 research and
innovation programme under grant agreements no. 780351 & 871493
Further Reading
• A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online
Reinforcement Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022
• A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement
learning for self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020),
LNCS 12571, Springer, 2020
• A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information
systems,” in 32nd Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS
12127. Springer, 2020
A. Metzger, ACSOS 2022 19
www.enact-project.eu www.dataports-project.eu

Weitere ähnliche Inhalte

Ähnlich wie Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems

Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...
Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...
Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...
Daiki Ikeuchi
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolio
Michael Kogan
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolio
Michael Kogan
 
cv_lukas_mandrake_jpl_update2016_formatted
cv_lukas_mandrake_jpl_update2016_formattedcv_lukas_mandrake_jpl_update2016_formatted
cv_lukas_mandrake_jpl_update2016_formatted
Lukas Mandrake
 

Ähnlich wie Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems (20)

A cross-layer approach to energy management in manufacturing
A cross-layer approach to energy management in manufacturingA cross-layer approach to energy management in manufacturing
A cross-layer approach to energy management in manufacturing
 
Robust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector MachineRobust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector Machine
 
Advances dynamic pile testing technology - Independent Geoscience Pty Ltd
Advances dynamic pile testing technology - Independent Geoscience Pty LtdAdvances dynamic pile testing technology - Independent Geoscience Pty Ltd
Advances dynamic pile testing technology - Independent Geoscience Pty Ltd
 
Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...
Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...
Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...
 
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo..."How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
 
Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management
 
report
reportreport
report
 
Electronics engineer resume
Electronics engineer resumeElectronics engineer resume
Electronics engineer resume
 
Optalysys Optical Processing for HPC
Optalysys Optical Processing for HPCOptalysys Optical Processing for HPC
Optalysys Optical Processing for HPC
 
Program Management in MBSE
Program Management in MBSEProgram Management in MBSE
Program Management in MBSE
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
PPT FOR BIG
PPT FOR BIGPPT FOR BIG
PPT FOR BIG
 
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
 
3D Television: When will it become economically feasible?
3D Television: When will it become economically feasible?3D Television: When will it become economically feasible?
3D Television: When will it become economically feasible?
 
Zack hsu resume
Zack hsu resumeZack hsu resume
Zack hsu resume
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolio
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolio
 
EC-TEL 2016: Which Algorithms Suit Which Learning Environments?
EC-TEL 2016: Which Algorithms Suit Which Learning Environments?EC-TEL 2016: Which Algorithms Suit Which Learning Environments?
EC-TEL 2016: Which Algorithms Suit Which Learning Environments?
 
cv_lukas_mandrake_jpl_update2016_formatted
cv_lukas_mandrake_jpl_update2016_formattedcv_lukas_mandrake_jpl_update2016_formatted
cv_lukas_mandrake_jpl_update2016_formatted
 

Mehr von Andreas Metzger

Antrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptxAntrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptx
Andreas Metzger
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Andreas Metzger
 
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Andreas Metzger
 
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Andreas Metzger
 

Mehr von Andreas Metzger (14)

Antrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptxAntrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptx
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
 
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
 
Data-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsData-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software Systems
 
Data-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process ManagementData-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process Management
 
Big Data Technology Insights
Big Data Technology InsightsBig Data Technology Insights
Big Data Technology Insights
 
Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles
 
Data-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information SystemsData-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information Systems
 
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
 
Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...
 
Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics
 
Predictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and RiskPredictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and Risk
 
Risk-based Proactive Process Adaptation
Risk-based Proactive Process AdaptationRisk-based Proactive Process Adaptation
Risk-based Proactive Process Adaptation
 
Predictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability EstimatesPredictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability Estimates
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems

  • 1. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems Felix Feit, Andreas Metzger, Klaus Pohl ACSOS 2022
  • 2. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 2 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 3. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Online Reinforcement Learning for SAS A. Metzger, ACSOS 2022 3 MAPE-K Combination RL Self-Adaptation Logic Analyze Monitor Execute Plan Knowledge Online RL for SAS Execute Policy (K) Monitor Action Selection (A + P) Policy Update Action A State S Reward R Next state S’ Action A State S Reward R Action Selection Next state S’ RL Agent Policy Policy Update Environ- ment • Online RL = Emerging approach for addressing design time uncertainty [Palm et al. @ CAiSE 2020]  Leverage information only available @ runtime (i.e., during live system execution) • Since 2019 use of Learning for SAS most prominent strategy [Porter et al. @ ACSOS 2020] • Conceptual Model of Online RL: [Metzger et al. @ Computing 2022] Learning Goal defined by Reward Function
  • 4. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Online Reinforcement Learning for SAS Policy (Knowledge) represented as deep neural network Pro • Handling of continuous states and actions • Generalization over unseen, neighboring states Con • Deep RL = “black box” •  Limited trustworthiness •  Difficult to debug e.g., reward function correctly defined? A. Metzger, ACSOS 2022 4 Increasing use of Deep RL The power of deep learning: “/imagine yellow Labrador in the style of…” [generated using Midjourney AI] UP RIGHT DOWN LEFT -12,224348 -12,044198 -12,463232 -12,349292 -11,368766 -11,407281 -11,567699 -11,741966 -10,724603 -10,758294 -10,878073 -10,956671 -10,118299 -9,9104485 -10,209248 -9,9700161 -9,2663503 -9,0282991 -9,4003475 -9,4925009 -8,1970854 -8,145322 -8,3578677 -8,6139991 -7,5583819 -7,3056764 -7,4218645 -7,6021728 -6,5359939 -6,4629871 -6,6290838 -6,800472 -5,8610507 -5,6457718 -5,6477581 -6,1414037 -5,01 -4,8068304 -4,7890915 -4,8609271 -3,9 -3,902123 -3,9078592 -4,0513057 -3,1389 -3,273 -2,9849214 -3,3706948 -12,609306 -12,612216 -12,579926 -12,784373 -11,845763 -11,794108 -11,845683 -12,478945 -11,237922 -10,873018 -10,900784 -11,436694 -10,008564 -9,9425947 -9,9517958 -10,268937 -9,2581936 -8,9864275 -8,9889605 -9,1798613 -8,1866584 -7,9931344 -7,9940185 -8,9510004 -7,0729858 -6,9970914 -6,9977784 -8,2920288 -6,0319487 -5,9988816 -5,9989739 -6,6224484 -6,0996963 -4,9994285 -4,999519 -6,232246 -4,8057376 -3,9997425 -3,9997874 -4,9212681 -3,1293615 -2,9999285 -2,9999384 -3,556979 -2,7588749 -2,1 -2 -2,2345458 -13,360876 -12 -13,954412 -12,991696 -12,565624 -11 -112,1835 -12,995431 -11,733772 -10 -112,79659 -11,980454 -10,740429 -9 -111,48554 -10,947642 -9,95339 -8 -112,12695 -9,9878453 -8,9112 -7 -112,67844 -8,8890419 -7,992178 -6 -112,91331 -7,965498 -6,9846114 -5 -112,41604 -6,9763523 -5,9533325 -4 -111,81117 -5,9401313 -4,9217978 -3 -110,6219 -4,8068301 -3,9307738 -2 -112,85584 -3,9418704 -2,9796888 -1,9340884 -1 -2,9827181 -13 -112,90933 -13,998779 -13,995334 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 UP RIGHT DOWN LEFT -12,224348 -12,044198 -12,463232 -12,349292 -11,368766 -11,407281 -11,567699 -11,741966 -10,724603 -10,758294 -10,878073 -10,956671 -10,118299 -9,9104485 -10,209248 -9,9700161 -9,2663503 -9,0282991 -9,4003475 -9,4925009 -8,1970854 -8,145322 -8,3578677 -8,6139991 -7,5583819 -7,3056764 -7,4218645 -7,6021728 -6,5359939 -6,4629871 -6,6290838 -6,800472 -5,8610507 -5,6457718 -5,6477581 -6,1414037 -5,01 -4,8068304 -4,7890915 -4,8609271 -3,9 -3,902123 -3,9078592 -4,0513057 -3,1389 -3,273 -2,9849214 -3,3706948 -12,609306 -12,612216 -12,579926 -12,784373 -11,845763 -11,794108 -11,845683 -12,478945 -11,237922 -10,873018 -10,900784 -11,436694 -10,008564 -9,9425947 -9,9517958 -10,268937 -9,2581936 -8,9864275 -8,9889605 -9,1798613 State S Action A State S Action A Deep RL Classical RL (Q-Learning) Environ- ment
  • 5. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Explainable Reinforcement Learning (XRL) for SAS A. Metzger, ACSOS 2022 5 State of the Art Goal-based Models [Welsh et al. @ Trans. CCI 2014] • Explanations in terms of the satisficement of softgoals • Requires making assumptions about environment dynamics at design time (difficult due to design time uncertainty) Provenance Graphs [Reynolds et al. @ MODELS Wkshp 2020] • Graph history to determine if, how and why model changed • Graph can become too complex to be meaningfully interpreted by humans • Query language suggested for “pruning” graphs but not for explanations Temporal Graph Models [Ullauri et al. @ SoSym 2022] • Explicitly considers Deep RL • Explanations via queries to model @ runtime • Interesting points of interactions extracted via CEP • No detailed, contrastive decomposition of explanations Example: Vacuum Cleaner Example: Fibonacci Example: Remote Data Mirroring
  • 6. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 6 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 7. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Reward Decomposition [Sequeira et al. @ Artif. Intell. 2020] Decompose reward function to explain short- term goal orientation of RL (train sub-RL agents) Pro • Helpful in the presence of multiple, “competing” quality goals for learning • Provides contrastive (counterfactual) explanations Con • No indication of explanation’s relevance • Requires manually selecting relevant explanations  cognitive overhead Interestingness Elements [Juozapaitis et al. @ IJCAI Wkshp 2019] Identify relevant moments of interaction between agent and environment at runtime Pro • Facilitates automatically selecting relevant interactions to be explained Con • Does not explain whether RL behaves as expected and for the right reasons 7 Augment and Combine RL Explanation Techniques from AI Research Decomposed Interestingness Elements (DINEs) A. Metzger, ACSOS 2022 +
  • 8. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Internal Behaviour Evolution of Reward External Behaviour Evolution of States and Actions 8 Understanding RL without DINEs? A. Metzger, ACSOS 2022 R S, A
  • 9. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Important Interaction Determine whether RL in given state is uncertain (wide range of actions) or certain (almost always same action) • How much does relative importance of actions differ for each sub-agent? • Number of DINES shown can be tuned via Threshold ρ (level of inequality) 9 Three Types of DINEs A. Metzger, ACSOS 2022 Visualization in Dashboard Certain Uncertain
  • 10. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Reward Channel Dominance Influence that each sub-agent has on each possible action • Influence of rewards of sub-agents on composed decision 10 Three Types of DINEs A. Metzger, ACSOS 2022 Visualization in Dashboard Relative
  • 11. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Reward Channel Extremum Points after local minimum/maximum of state-value  RL decisions in potentially critical states • ExpectedReward (S) – ExpectedReward (S’) > ϕ  Maximum • Number of DINES shown can be tuned via Threshold ϕ 11 Three Types of DINEs A. Metzger, ACSOS 2022 Visualization in Dashboard Minimum Maximum
  • 12. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 12 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 13. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Validation Proof-of-Concept Implementation • Double Deep Q-Networks with Experience Replay [Hasselt et al. @ AAAI 2016] • Approximation of environment model using supervised learning on contents of replay memory • OpenAI Gym interface to connect RL and SAS A. Metzger, ACSOS 2022 13 Experimental Setup RL Problem Formulation • Action Space = {Add / remove web servers, Change dimmer value} • State Space = {Request arrival rate, Average throughput, Average response time} • Decomposed Reward Function Self-Adaptive System • “SWIM” Exemplar [Moreno et al. @ SEAMS 2018] • Self-adaptive multi-tier web application
  • 14. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Validation A. Metzger, ACSOS 2022 14 Qualitative Results
  • 15. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Validation Important Interactions Reward Channel Extrema A. Metzger, ACSOS 2022 15 Quantitative Results Cognitive load ~ number of DINEs shown to developers
  • 16. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 16 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 17. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Discussion A. Metzger, ACSOS 2022 17 Limitations of XRL-DINE May generate difficult to understand explanations • Reason 1: Reward function was decomposed incorrectly or non-optimally • Reason 2: Environment dynamics may delay effects of adaptations and thus understandability Not directly applicable to collaborative adaptive systems • XRL-DINE does not consider decisions of other RL agents • May lead to misleading explanations if DINE-XRL is directly applied in collaborative setting Only works for value-based deep RL • XRL-DINEs computed using value-function Q(S, A) – details see paper • Value-based deep RL: policy = Q(S, A) approximated by neural network
  • 18. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Outlook A. Metzger, ACSOS 2022 18 Considering Explanation Requirements from Social Sciences [Miller @ Artif. Intell. 2019] Contrastive: “why P happened instead of Q?”  “Reward Channel Dominance” DINE Selective: “no need for complete course of events”  “Reward Channel Extrema” DINE & “Important Interactions” DINE Causal: “most likely explanation not necessarily the best”  Future work: e.g., check whether agent relies on spurious correlations (and not causality) [Gajcin et al. @ AAAMAS Wkshp 2022] Social: “transfer of knowledge as part of a conversation”  Future work: e.g., Chatbot4XAI   ? ? Prediction Explanation Train will be delayed Train passed last light 5 min later than typical This also happened yesterday, but why today a problem? Because of attribute “number of trains behind current one” > 5 What would have to change such that not delay (counterfactual) “number of trains behind current one” < 2 Human (Explainee) Chatbot4XAI (Explainer)
  • 19. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Thank You! Research leading to these results has received funding from the EU’s Horizon 2020 research and innovation programme under grant agreements no. 780351 & 871493 Further Reading • A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online Reinforcement Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022 • A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement learning for self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020), LNCS 12571, Springer, 2020 • A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information systems,” in 32nd Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS 12127. Springer, 2020 A. Metzger, ACSOS 2022 19 www.enact-project.eu www.dataports-project.eu

Hinweis der Redaktion

  1. Monet – Kandinski -- Marc