1. Emotion based reward valuation, prediction,
and learning in the human brain
Seungyeon Kim
Department of BioSystems
Korea Advanced Institute of Science and Technology
February 8, 2007
2. ACKNOWLEDGMENTS
Brain Dynamics Laboratory, KAIST
Thesis Committee
Kyongsik Yun
Jaeseung Jeong, Chair
Hansem Sohn
Doheon Lee
Yong Jeong
Department of Psychiatry, Reward Learning Laboratory,
College of Physicians and Surgeons, Division of Humanities and Social Sciences,
Columbia University California Institute of Technology
Yong-An Chung John P. O’Doherty
Ron Whiteman Alan Hampton
**This work is supported by Brain Dynamics Laboratory, Department of
BioSystems, KAIST & KOSEF International Student Scholarship.
3. ce is usually based on a comparison of risks and benefits. If the latter exceed
ing that risks and benefits accrue to the same person or group, the project
t we do not live in a black-and-white world, and outcomes sometimes don’t
Neurobiology of choice behavior
yes-or-no choice, especially when there are alternative ways of gaining the
that case, the only realistic basis for choosing comes down to a comparison
ive.
r industrial democracies, where people and their governments tend to be
inistrative entities usually create a presumption favoring more safety rather
are often vague (“reasonable certainty of no harm” or “adequate
urage an unrealistic belief that risks can
ether. A frequent result is that legal
dividual decision-makers amount to
or intermediates.
isk comparisons, as in the following
ing its water supply with chlorination.
rganic compounds in natural water
ns, some of which have carcinogenic
ion Agency (EPA) is charged with
esponsible for controlling waterborne
vels of chlorination, the EPA had to
t the risk of contamination with small
ubstance. In a lengthy negotiation, the
, resulting in a decision about the safe
n drug that relieves a painful arthritic
y a large health maintenance organization shows that at
seeking relief from chronic joint pain, there is a risk of cardiac malfunction—
bjects. You have to decide whether the risk of continuing to take the medicine
with your mobility loss and pain. Over-the-counter anti-inflammatory drugs
“ Value and efficiency”
, so you prefer not to switch to them. There’s no history of heart disease in your
with the drug’s cardiac risk. In the end, after consultation with your physician,
espite the warning label.
h larger-scale societal decisions. For a number of reasons, many developed
nuclear power generation are too great to engage in traditional risk/benefit
owing scientific consensus that the emission of carbon dioxide and other
4. The Science of
Neuroeconomics
Social rejection Eisenberger et al. Science 2003
Moral Reasoning Green et al. Neuron 2004
Regret Camille et al. Science 2004
Ambiguity Hsu et al. Science 2005
Trust Kosfled et al. Nature 2005
Dread Berns et al. Science 2006
Ambiguity Huettel et al. Neuron 2006
Reward vs Risk Preuschoff et al. Neuron 2006
Purchase Knutson et al. Neuron 2007
Loss aversion Tom et al. Science 2007
5. Dopamine neurons and TD error
δ(t) = r(t) + γV(s(t+1)) - V(s(t))
before learning
r
V
δ
after learning
r
V
(Schultz, Dayan, & Montague, 1997)
δ
omit reward
r
V
δ
6. Dopamine neurons and TD error
δ(t) = r(t) + γV(s(t+1)) - V(s(t))
before learning
r
V
δ
after learning
r
V
(Schultz, Dayan, & Montague, 1997)
δ
omit reward
r
V
δ
7. Dopamine neurons and TD error
δ(t) = r(t) + γV(s(t+1)) - V(s(t))
before learning
r
V
δ
after learning
r
V
(Schultz, Dayan, & Montague, 1997)
δ
omit reward
r
V
δ
8. Dopamine neurons and TD error
δ(t) = r(t) + γV(s(t+1)) - V(s(t))
before learning
rr r
V
V
V
δ
δ
r
after learning
r
r
V
V
V
δ
(Schultz, Dayan, & Montague, 1997)
δ
r
omit reward
r
Vr
V
δV
δ
9. Dopamine neurons and TD error
δ(t) = r(t) + γV(s(t+1)) - V(s(t))
before learning
rr r
V
V
V
δ
δ
r
after learning
r
r
V
V
V
δ
(Schultz, Dayan, & Montague, 1997)
δ
r
omit reward
r
Vr
V
δV
δ
10. Dopamine neurons and TD error
δ(t) = r(t) + γV(s(t+1)) - V(s(t))
before learning
rr r
V
V
V
δ
δ
r
after learning
r
r
V
V
V
δ
(Schultz, Dayan, & Montague, 1997)
δ
r
omit reward
r
Vr
V
δV
δ
11. Dopamine neurons and TD error
δ(t) = r(t) + γV(s(t+1)) - V(s(t))
before learning
rr r
V
V
V
δ
δ
r
after learning
r
r
V
V
V
δ
(Schultz, Dayan, & Montague, 1997)
δ
r
omit reward
r
Vr
V
δV
δ
12. Dopamine neurons and TD error
δ(t) = r(t) + γV(s(t+1)) - V(s(t))
before learning
rr r
V
V
V
δ
δ
r
after learning
r
r
V
V
V
δ
(Schultz, Dayan, & Montague, 1997)
δ
r
omit reward
r
Vr
V
δV
δ
13. Dopamine neurons and TD error
δ(t) = r(t) + γV(s(t+1)) - V(s(t))
before learning
rr r
V
V
V
δ
δ
r
after learning
r
r
V
V
V
δ
(Schultz, Dayan, & Montague, 1997)
δ
r
omit reward
r
Vr
V
δV
δ
14. Key questions in this investigation
i. What are neural substrates of
disappointment and elation?
ii. Can emotion (disappointment/elation)
influence our reward valuation?
iii. Can TD model describe emotion
induced reward prediction in the brain?
and where?
6
15. Proposed hypotheses
i. We hypothesized that emotion can interact with
reward valuation in the human brain.
ii. Also, emotion (e.g. disappointment, elation) can
increase/decrease rewarding experience with the reward
prediction errors and alter reward valuations.
iii. TD model can describe emotional based reward
learning in the human brain even with abstract reward
in a non-conditioning paradigm.
7
16. fMRI Experiment & Data Acquisition
3T scanner (Oxford OR63) at KAIST fMRI Center,
Daejeon, Republic of Korea
24 horizontal slices, 3x3x3 mm resolution
TR=2 s, TE=35 ms, FOV=220 mm, Slice Thickness 3 mm
Oblique orientation of 30° to the AC line
8
17. Subjects
27 healthy, right-handed subjects
recruited from CNU, KAIST.
[M:F(14:13); mean age 21.1 years,
Range 19-25 years, SD: 2.39)]
KAIST IRB APPROVED
After Reward Learning
No Reward Learning
•
• N:14 (M:7,F:7) all right-
N:13 (M:7,F:6) all right-
handed, healthy individuals
handed, healthy individuals
with no prior history of
with no prior history of
neurological disease
neurological disease
18. Winning
“Wheel of Numbers” Task Number
Target
Number
Click
acquisition
acquisition
Click
Betting
Balance
Game N
20. FMRI DATA ANALYSIS OVERVIEW
Statistical parametric map (SPM)
Design matrix
Time-series data Kernel
Realignment Smoothing General linear model
Gaussian
Statistical
field theory
inference
Normalisation
p <0.05
Template
Parameter estimates
22. No Reward Learning
Neural correlates
Fixed probability 0.1
of disappointment
Time-locked to Target-Winning number mis-match
in 7/10 games in the fMRI scanning.
OFC BA 10
Right
DLPFC BA 47
14
23. No Reward Learning
Neural correlates of Elation
Fixed probability 0.1
Time-locked to Target-Winning number match
in 3/10 games in the fMRI scanning.
right DLPFC BA47
right VLPFC BA46
Bilateral OFC BA10
15
24. OFC BA10! ! R! 20, 64, 6! 2.18 However, this left the problem of area
a Right Left
x
human map, which was still not include
+60 +50 +40 +30 +20 +10 0 –10 –20 –30 –40 –50 –60
map. Petrides and Pandya10 subseque
70 70
reconcile the remaining inconsistencies
inferior temporal area60
BA20! R! 60, -36, -16! 2.14 10o 60
human and monkey cytoarchitectonic m
ling the lateral parts of the orbitofrontal
47/12r
middle temporal area 50 50
BA21! R! 66, -44, -12! 3.78 45
11m
(FIG. 1c). Further subdivisions of the o
11l
40 40 cortex were later proposed on the b
Fusiform Gyrus, BA37! ! R! 52, -52, -15! 3.20
y
different histochemical and immunoh
47/12l 45
30 30
13m
stains11 (FIG. 1d).
14r
13l
Two important cytoarchitectonic fe
20 20
13b
47/12m
47/12s
orbitofrontal cortices are the phyloge
14c
10 10
lam lal
ences BOX 1 and the considerable variab
13a
lai
AON
lapm
individuals12,13 (FIG. 2). The former pos
0 0
problems when trying to understand
ELATION ACTIVATION SITES
DISAPPOINTMENT ACTIVATION SITES relationships across species, and the latte
b Type 1 Type 2 Type 3
-38, 48,esting methodological challenges for tho
VMPFC BA11! L! -8! 3.87
OFC BA10! ! R! 12, 70, 8 ! 2.10 to normalize individual brains to a temp
-26, 56,allow them to explore the functional an
OFC ! BA10! ! L! -2! 2.78
DLPFC BA46! R! 38, 38, -6! 2.02
human orbitofrontal cortex.
-20, 50, 4!The2.49orbitofrontal cortex receives inp
OFC ! BA10! ! L!
OFC BA10! ! R! 35, 55, -4! 1.80
five classic sensory modalities: gustato
VLPFC BA47! R 48, 40,somatosensory, auditory and visual14. It
-16! 2.68
OFC BA10! ! L! -14, 68, 6! 1.88
visceral sensory information, and all this
DLPFC BA 46 R! 50, 44, 0 ! 2.31
the orbitofrontal cortex perhaps the mo
DLPFC BA9! ! R! 20, 62, 30! 1.86
20, 64, region 2.18 entire cortical mantle, wi
in the
OFC BA10! ! R! 6!
cR! ble exception of the rhinal regions of t
DLPFC BA9! ! 12, 54, 32! 1.84
lobes15.
The orbitofrontal cortex also has dire
ACC ! ! ! R! 24, 38, 6! 1.74
inferior temporal area BA20! R! 60, -36, -16! 2.14
connections with other brain structures,
OFC BA 10! ! R! 10, 56, 0! 1.73 amygdala16,17, cingulate cortex18,19, insula/
middle temporal area BA21! R! 66, -44, -12! 3.78
hypothalamus21, hippocampus22, striatum
OFC BA10! ! L! -14, 58, 18! 1.68 ductal grey-52, -15! 3.20 prefrontal
52, and dorsolateral
21
Fusiform Gyrus, BA37! ! R!
The brain regions showing significant correlation with the OFC which terms of itsneuroanatomy in human
Functional
was neuroanatomical connectivi
In
also involved in regret (Camille et al., 2004) and also ACC (24,38,6mm;z=2.74), uniquely placed to integ
frontal cortex is
and visceral motor information to mod
which is involved in the5 conflict monitoring (Kerns et al., years
20 2004).
years
iour through both visceral and motor s 16
Figure 2 | Anatomy, variability and development of the human orbitofrontal cortex. has led to the proposal that the orbitofr
a | A human cytoarchitectonic map of the orbitofrontal cortex rendered on the orbitalACTIVATION SITES
DISAPPOINTMENT surface in
26. Emotion induced reward valuations
Before Learning
1. Unlearned target + reward = elation
2. Unlearned target + no reward = disappointment
After Learning
1. Learned target + no reward = big disappointment
2. Learned target + reward = small elation
3. Unlearned target + reward = big elation
4. Unlearned target + no reward = small disappointment
To isolate our hypothesis, we kept betting amount, rewarding amount, and
probabilities constant to focus on emotional effects in the human brain
during decision making.
18
27. After Reward Learning Neural correlates
High probability of Target 7
of disappointment revisited
Target-Winning number mis-match in 3/10 games in the fMRI scanning.
Bilateral OFC KE= 648
right OFC(10,60,-18mm; Z= 3.60)
left OFC(-4,56,-18mm;Z=3.47)
decision betting roulette result inter-round betting
possibilities cue to bet cue to bet
shown period moving revealed delay period period
shown
19
10s 4s 8s 2s 4s
28. After Reward Learning VS No Reward Learning
Comparison of brain signal arisen from the monetary loss
Neural correlates of disappointment
Both disappointment activates
bilateral OFC which also involved
in regret (Camille et al., 2004).
After reward learning, additional
activations in Hippocampus (R)
BA36 (32, -24, -26mm; Z=3.09;
KE=16), & Precuneus (L) BA7 (-18,
-75, 52mm; Z=2.85; KE=16) but
did not activate ACC and DLPFC.
20
29. After Reward Learning
High probability of Target 7 (0.6)
Target-Winning number match
in 4/10 games in the fMRI scanning.
Neural correlates of elation revisited
No Significant activations found
30. Disappointment increases OFC activity (voxel-voxel)
Disappointment increases OFC activity (voxel-voxel)
No learned target No learning Learned target
No learned target No learning Learned target
Mismatch Mismatch Mismatch
Mismatch Mismatch Mismatch
31. Roles of Striatum in Elation (voxel-voxel)
Learned target Unlearned target
Target-Win match Target-Win match
32. TD learning describes emotional
learning and emotioned prediction of
reward in the human brain
33. After Reward Learning
High probability Target “7”
Time-locked to Target Number 7 ! Reward prediction signal
shown and awarded rewards during
3/10 games in the fMRI scanning.
Table 1. Activation for positive reward
prediction-error
Cluster Z Coordinates
Size (max stat) XYZ
Regions Laterality
Putamen L 156 4.55 -22 -2 20
Caudate Body R 43 4.17 18 12 12
Supramarginal area BA 40 L 34 3.96 -62 -20 18
Superial Temporal Gyrus R 57 3.92 -40 -40 42
Inferior Frontal Gyrus L 44 3.84 42 -34 12
Inferior Frontal Gyrus L 17 3.74 -44 22 8
Caudate Body L 10 3.71 -10 12 8
Posterior Lobe L 15 3.67 -38 -66 -26
Supramarginal area BA 40 R 61 3.62 52 -58 40
Precuneus BA 7 L 18 3.55 -12 -72 52
Putamen R 7 3.52 28 -14 4
Postcentrual Gyrus BA 3 R 36 3.42 58 -12 44
Insula R 3 3.40 36 20 12
Anterior Cingulate Cortex R 7 3.38 6 2 34
Supramarginal area BA 40 L 9 3.35 -50 -30 28
Caudate Body R 8 3.34 12 0 24
Superior Temporal Gyrus R 2 3.22 48 -24 10
Supramarginal area BA 40 L 1 3.22 -52 -46 56
Middle Frontal Gyrus BA 6 L 5 3.19 -26 -2 42
Hippocampus R 1 3.19 32 -34 -6
25
34. After Reward Learning
High probability Target “7”
Negative prediction error
Time-locked to Target Number 7
shown and awarded no rewards during
3/10 games in the fMRI scanning.
Insula
Left cerebrum, sub lobar,
Insula, (6,12,-2) Z=3.32
26
35. Reinforcement learning-based Regressor Analysis
Estimate V(t) and δ(t) from TD modeling results
Regression analysis of fMRI data
TASK SUBJECT MODEL
Temporal
Difference
fMRI data
learning model
Peak Activation TD error δ(t)
δ(t)
SPM HRF extraction
Canonical HRF function
convolution
Reward timing & results
ROI on neural substrate of TD error δ(t)
39. TD Modeling Results VS Reward Prediction Signal
The brain regions showing significant correlation with the “Target Number 7”(CS) from the
“Wheel of Numbers” Task (Time locked to the target number shown) after reward learning.
HRF time series extracted from SPM results were plotted against HRF convolved TD model.
PE HRF
+ +
HRF Model
31
40. TD Modeling Results VS Negative Prediction Error
The brain regions showing significant correlation with the “Target Number 7” from the
“Wheel of Numbers” Task (Time locked to the result shown) after reward learning. HRF time
series extracted from SPM results were plotted against HRF convolved with TD model.
PE HRF
+ +
HRF Model
32
41. Summary
i. What are neural substrates of disappointment and elation?
Disappointment signal is correlated with OFC, ACC,
DLPFC. Elation is OFC, VLPFC, VMPFC.
ii. Can emotion alter our reward valuation?
Results show that OFC, Putamen, and Caudate Body
increases linearly with increase emotion
(small to big disappointment/elation).
iii. Can TD model describe emotion induced reward
prediction in the brain? and where?
Left putamen, and left Insula are brain regions where
TD model describe reward learning computations occur.
42. References
Ashburner & Friston (1997):
Multimodal image coregistration and partitioning -
a unified framework.
NeuroImage 6(3):209-217
Lee D (2006):
Neural Basis of Quasi-Rational Decision-Making
Current Opinion in Neurobiology 16:191-198
Schultz W, Dyan P & Montague PR (1997):
A neural substrate of prediction and reward
Science 275: 1593-1599
Sutton RS & Barto AG (1998):
Reinforcement learning: An introduction
Cambridge, MA: MIT
34