Marinier Laird Cogsci 2008 Emotionrl Pres

Emotion-Driven
Reinforcement Learning
Bob Marinier & John Laird
University of Michigan, Computer Science and Engineering
CogSci’08

2

Introduction
• Interested in the functional benefits of emotion
for a cognitive agent
▫ Appraisal theories of emotion
▫ PEACTIDM theory of cognitive control
• Use emotion as a reward signal to a
reinforcement learning agent
▫ Demonstrates a functional benefit of emotion
▫ Provides a theory of the origin of intrinsic reward

3

Outline
• Background
▫ Integration of emotion and cognition
▫ Integration of emotion and reinforcement learning
▫ Implementation in Soar
• Learning task
• Results

4

Appraisal Theories of Emotion
• A situation is evaluated along a number of appraisal
dimensions, many of which relate the situation to
current goals
▫ Novelty, goal relevance, goal conduciveness, expectedness,
causal agency, etc.
• Appraisals influence emotion
• Emotion can then be coped with (via internal or
external actions)
Situation
Goals

Coping Appraisals

Emotion

5

Appraisals to Emotions (Scherer 2001)
Joy Fear Anger
High/medium High High
Suddenness
High High High
Unpredictability
Low
Intrinsic pleasantness
High High High
Goal/need relevance
Other/nature Other
Cause: agent
Chance/intentional Intentional
Cause: motive
Very high High Very high
Outcome probability
Discrepancy from High High
expectation
Very high Low Low
Conduciveness
High
Control
Very low High
Power

6

Cognitive Control: PEACTIDM (Newell 1990)
Perceive Obtain raw perception
Encode Create domain-independent
representation
Attend Choose stimulus to process
Comprehend Generate structures that relate stimulus
to tasks and can be used to inform
behavior
Task Perform task maintenance
Intend Choose an action, create prediction
Decode Decompose action into motor commands

Motor Execute motor commands

7

Unification of PEACTIDM and Appraisal Theories

Perceive
Environmental Raw Perceptual
Change Information

Motor Encode
Suddenness
Stimulus
Unpredictability
Motor Relevance
Goal Relevance
Commands Intrinsic Pleasantness
Prediction

Outcome
Decode Attend
Probability

Causal Agent/Motive
Action Stimulus chosen
Discrepancy
for processing
Conduciveness
Control/Power

Intend Comprehend
Current Situation
Assessment

8

Distinction between emotion, mood, and feeling
(Marinier & Laird 2007)
• Emotion: Result of appraisals
▫ Is about the current situation
• Mood: “Average” over recent emotions
▫ Provides historical context
• Feeling: Emotion “+” Mood
▫ What agent actually perceives

10

Intrinsically Motivated Reinforcement Learning
(Sutton & Barto 1998; Singh et al. 2004)
External
Environment
Environment
Actions Sensations

Critic
Internal
Environment
Appraisal
Actions Rewards States Critic
Process

+/- Feeling
Decisions Rewards States
Intensity
Agent

Agent
“Organism”

• Reward = Intensity * Valence

11

Extending Soar with Emotion
Symbolic Long-Term Memories
Procedural Episodic
Semantic

Reinforcement Chunking Episodic
Semantic
Learning Learning
Learning

Short-Term Memory
Appraisal
Detector Decision
Procedure
Situation, Goals

Visual
Perception Action
Imagery

Body

12

Extending Soar with Emotion
Symbolic Long-Term Memories
Procedural Episodic
Semantic

Reinforcement Chunking Episodic
Semantic
Learning Learning
Learning
Appraisal Detector

Feeling
.9,.6,.5,-.1,.8,…
Short-Term Memory
Decision
Feelings Procedure
Situation, Goals
Emotion
Mood
.5,.7,0,-.4,.3,…
.7,-.2,.8,.3,.6,…

Visual
Perception Action
Imagery

Body
Knowledge

Architecture

13

Learning task

Start

Goal

14

Learning task: Encoding
North
Passable: false
On path: false
Progress: true

East
West
Passable: false
Passable: false
On path: true
On path: false
Progress: true
Progress: true

South
Passable: true
On path: true
Progress: true

15

Learning task: Encoding & Appraisal
North
Intrinsic Pleasantness: Low
Goal Relevance: Low
Unpredictability: High

East
West
Goal Relevance: High
Goal Relevance: Low

South
Intrinsic Pleasantness: Neutral
Unpredictability: Low

16

Learning task: Attending,
Comprehending & Appraisal

South
Intrinsic Pleasantness: Neutral
Unpredictability: Low
Conduciveness: High
Control: High …

18

Learning task: Tasking

Optimal Subtasks

19

What is being learned?
• When to Attend vs Task
• If Attending, what to Attend to
• If Tasking, which subtask to create
• When to Intend vs. Ignore

20

Learning Results
12000
Median Processing Cycles

10000

8000

6000

4000

2000

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Episode
Standard RL Feeling=Emotion Feeling=Emotion+Mood

21

Results: With and without mood
300
Median Processing Cycles

290

280

270

260

250

240
8 9 10 11 12 13 14 15
Episode
Feeling=Emotion Feeling=Emotion+Mood Optimal

22

Discussion
• Agent learns both internal (tasking) and external
(movement) actions
• Emotion allows for more frequent rewards, and
thus learns faster than standard RL
• Mood “fills in the gaps” allowing for even faster
learning and less variability

23

Conclusion & Future Work
• Demonstrated computational model that integrates
emotion and cognitive control
• Confirmed emotion can drive reinforcement learning
• We have already successfully demonstrated similar
learning in a more complex domain
• Would like to explore multi-agent scenarios

Marinier Laird Cogsci 2008 Emotionrl Pres

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (17)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Marinier Laird Cogsci 2008 Emotionrl Pres

Ähnlich wie Marinier Laird Cogsci 2008 Emotionrl Pres (20)

Mehr von guru001

Mehr von guru001 (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Marinier Laird Cogsci 2008 Emotionrl Pres