Generating Adaptations from the System Execution using Reinforcement Learning Options

Nicolás Cardozo - Ivana Dusparic
@ncardoz - @ivanadusparic
n.cardozo@uniandes.edu.co - ivana.dusparic@scss.tcd.ie
COP’21 - International Workshop on Context-Oriented Programming and Advanced Modularity - July 12 - (Virtual)
Generating Adaptations from the System
Execution using Reinforcement Learning
Options

Context-oriented programming
2
COP systems are software systems which have
to dynamically adapt their behavior in order to
cope with a changing environment.
[Acher et al. 09]

2
COP systems are software systems which have
to dynamically adapt their behavior in order to
cope with a changing environment.
[Acher et al. 09]
[context-aware systems] use context to provide
relevant information and/or services to the user,
where relevancy depends on the user’s task
[Dey 01]
Context-oriented Programming (COP) as a new
programming technique to enable context-
dependent computation. We claim that Context-
oriented Programming brings a similar degree of
dynamicity to the notion of behavioral variations
[Hirshfeld et al. 08]
Context-oriented programming [6] is a
technique to modularize context-dependent
behavioral variations in a program, where those
behavioral variations can be dynamically
switched on and off in response to changes of
execution contexts
[Aotani et al. 11]
Context-oriented Programming (COP) [4]
enriches programming languages and
execution environments with features to
explicitly represent context-dependent
behavior variations.
[Appeltauer et al. 08]
Context-Oriented Programming (COP) [3] to
support programmers in developing software that
can dynamically change its behavior depending
on context information. [Afanasov et al. 13]
Context-aware systems are able to adapt their
behaviour depending on their context of use
without explicit user intervention
[Bainomugisha et al. 09]
described as a way to promote runtime
variability use and as a mechanism for
managing context features dynamically to cater
to the needs of dynamic adaptation
[Capilla et al. 14]
the goal of Context-oriented
Programming is to avoid having to spread
context-dependent behavior throughout a
program…
… can only be applied for context-
dependent behavior that are anticipated
in the software development process.
[Constanza et al. 05]
COP addresses the need for applications to
behave differently accordingly to the changing
run-time context in which they are embedded.
[ghezzi et al. 10]
In order to implement systems that are able to
use the implicit situational information…
… the system is able to learn from the user
preferences in order to autonomously evolve his
rules for future behavior
[Alegre et al. 16]
The combination of COP with computational reflection
opens further possibilities for runtime software adaptivity.
[Gonzalez et al. 09]

3
Contexts are
(meaningful) situations
gathered from the
surrounding environment
Behavior variations correspond to
the specialized behavior
appropriate for a specific context
Adaptations correspond to the
behavior observed by the
system when executing in a
context
[S. Gonzalez. Programming in Ambience: Gearing up for dynamic adaptation to context. PhD thesis, 2008]

Phone = {
current: null,
active: [],
incoming: [],
missed: [],
terminated[],
receive: function(call) {
display(“call ringtone”)
incoming.push(call)
},
suspend: function(call) {
current=null
active.push(call)
}}
4
Silent = new cop.Context{(
name: “Silent”
})
Discretion = Trait({
display(“vibrate on call”)
incoming.push(call)
}
})
Silent.adapt(Phone, Discretion)
Context
Behavior variation
Adaptation relation
[S. Gonzalez et al. Context Traits: Dynamic Behaviour Adaptation Through Run-Time Trait Recomposition. AOSD, 2013]

5
Off-hook Silent Meeting Forwarding Behavior
x x x Ringtone
x ✔ x Vibrate
x ✔ ✔ x Vibrate
x ✔ ✔ ✔ Call forwarding
✔ x x Call waiting
✔ ✔ x Call waiting
✔ ✔ ✔ x Call waiting
✔ ✔ ✔ ✔ Call forwarding

is great for modularity!

7
name: “Silent”
})
incoming.push(call)
}
})
Phone = {
current: null,
active: [],
incoming: [],
missed: [],
terminated[],
incoming.push(call)
},
current=null
active.push(call)
}}

7
name: “Silent”
})
incoming.push(call)
}
})
Meeting = new cop.Context{(
name: “Silent”
})
ForwardCall = Trait({
forwardNumber: 1 800 281 7696,
forwardNumber.makeCall(call)
}
})
Phone = {
current: null,
active: [],
incoming: [],
missed: [],
terminated[],
incoming.push(call)
},
current=null
active.push(call)
}}
…

Predefined by
developers, not
extracted from the
context
8
Phone = {
current: null,
active: [],
incoming: [],
missed: [],
terminated[],
incoming.push(call)
},
current=null
active.push(call)
}}
name: “Silent”
})
incoming.push(call)
}
})
name: “Silent”
})
}
})

Predefined by
developers, not
extracted from the
context
8
Phone = {
current: null,
active: [],
incoming: [],
missed: [],
terminated[],
incoming.push(call)
},
current=null
active.push(call)
}}
name: “Silent”
})
incoming.push(call)
}
})
name: “Silent”
})
}
})
Not dynamically adaptive
to the context
⟹

10
Generating adaptations from the context
Phone = {
receive:
function(user) {
display(“call
ringtone”)
}
}
name: “Silent”
})
incoming.push(call)
}
})
name: “Meeting”
})
forwardNumber: +5726470585,
receive: function() {
}
})
Unknown

12
System design
The system must:
•Have users or be autonomous
•A defined goal
•A a way to know we progress towards
the goal
•A finite set of states
•A finite set of actions

13
Context monitoring
To incorporate contexts (and their behavior)
dynamically, we need to capture the system state
and possible actions
5x5 grid
move_north()
move_south()
move_east()
move_west()
pickup()
dropoff()

14
Execution trace
2,3,true,pickup,2,3,false,10,p=0
2,3,false,south,3,3,false,0,p=0
3,3,false,west,3,2,false,0,p=0
4,1,false,dropoff,4,1,true,20,p=0
0,0,true,east,0,1,true,0,p=0
0,1,true,south,1,1,true,0,p=0

15
Adaptation extraction

15
We are interested in action sequences
that are executed often for a state

15
We are interested in action sequences
that are executed often for a state
The appropriate
behavior for that state

16
Adaptation definition
Context = new cop.Context{(
name: “ ”
})
0,0,true east
south
east

16
name: “ ”
})
0,0,true
east
south
east
Context00true

16
name: “ ”
})
0,0,true
east
south
east
Context00true
Context00trueBehavior = Trait({
option: function() {
}
})

18
Adaptations are extracted through reinforcement learning
State1 ‐ Action1
State2 ‐ Action2
State1‐ Action3
State3 ‐ Action2
State1 ‐ Action1
State2 ‐ Action2
State1‐ Action3
State3 ‐ Action2
…
ContextA ‐ AdaptationA
ContextB ‐ AdaptationB
...
State3 ‐ UserAction4
ADAPTATION ENGINE
System atomic
actions
Learned behavioral
adaptations
ADAPTIVE SYSTEM
Unsupervised
Learning/RL
ADAPTATION GENERATOR
State1 ‐ Action1
State2 ‐ Action2
State1‐ Action3
State3 ‐ Action2
State1 ‐ Action1
State2 ‐ Action2
State1‐ Action3
State3 ‐ Action2
…
ContextA ‐ AdaptationA
ContextB ‐ AdaptationB
...
State3 ‐ UserAction4
EXECUTION TRACE
ADAPTATION ENGINE
System primitive
actions
Adaptation
generator
ADAPTIVE SYSTEM
State‐option pairs
RL Option learner
Execute(action)
Execute(adaptation)
ENVIRONMENT
Action
sequences
Environment
1.Extract action sequences of
variable lengths
2.Use RL options to choose
the best action sequence for
each state
3.Generate an adaptation for
the selected action
sequence

1
19
RL options
-1
-1
1
1
Action (sequences) get a reward for every state

2
1
19
RL options
-1
-1
1
1
0
10
Accumulate rewards for actions sequences
∑
a∈A
q(s, a)

2
1
19
RL options
-1
-1
1
1
0
10
3
Use RL to learn the best action sequence as the
option, based on the accumulated reward
Accumulate rewards for actions sequences
∑
a∈A
q(s, a)

20
RL options
for i in 1..batchSize:
action_sequence[s].push(a, r+q_value[s][a])
while not done:
if P(s) >= 𝜀:
a = q_val[s]
r,next_state,done = step(a)
q_val[s][a] = (1-𝛼)*q_val[s][a] + 𝛼*(r + 𝛾*next_state)
——-
options.add(s, action_sequence)
while true:
if available_adaptation(s):
context,option = pick_option(𝜀, s)
context.activate()
execute(option)
context.deactivate()

22
Warehouse robot delivery
Robot moves in a defined space searching for packages and takes them to the delivery area
Packages are at fixed locations
Paths to delivery are always the
same!

23
There is a context for each location of the robot, for each product
ContextDiamond23 = new cop.Context{(
name: “Diamond-2,3”
})
ContextShirt20 = new cop.Context{(
name: “Shirt-2,0”
})
ContextCarrot44 = new cop.Context{(
name: “Carrot-4,4”
})

23
There is a context for each location of the robot, for each product
ContextDiamond23 = new cop.Context{(
name: “Diamond-2,3”
})
ContextShirt20 = new cop.Context{(
name: “Shirt-2,0”
})
ContextCarrot44 = new cop.Context{(
name: “Carrot-4,4”
})
.
.
.
…

24
Robot moves in a defined space searching for packages and takes them to the delivery area
Context23false = new cop.Context({
name: “Context23false"
})
BAContext23false = Trait({
option: function() {
this.south();
this.west();
this.west();
this.south();
this.dropoff();
})

26
✓ Continuously process execution traces to extract action
sequences and their state
✓ Generated adaptations from extracted options
✓ Use of RL to manage options as the system’s most
appropriate behavior, and continuously update new options
Pushing COP forward not only to enable dynamic behavior
variations. Auto-COP lets systems to become adaptive to
unknown contexts and behavior
@ncardoz n.cardozo@uniandes.edu.co

26
✓ Continuously process execution traces to extract action
sequences and their state
✓ Generated adaptations from extracted options
✓ Use of RL to manage options as the system’s most
appropriate behavior, and continuously update new options
Pushing COP forward not only to enable dynamic behavior
variations. Auto-COP lets systems to become adaptive to
unknown contexts and behavior
@ncardoz n.cardozo@uniandes.edu.co
Explore more system types
Integrate lifelong learning techniques to manage generated
adaptations

Generating Adaptations from the System Execution using Reinforcement Learning Options

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Generating Adaptations from the System Execution using Reinforcement Learning Options

Ähnlich wie Generating Adaptations from the System Execution using Reinforcement Learning Options (20)

Mehr von Universidad de los Andes

Mehr von Universidad de los Andes (17)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Generating Adaptations from the System Execution using Reinforcement Learning Options