Jennie sinsfadp06

Gradient Algorithms,
Robustness, and Partial
Observability
- In the context of Cortical Neural Control
using Rat Model

Jennie Si
Department of Electrical Engineering
Arizona State University

si@asu.edu NSF ADP 2006

Motivation/Challenge/Societal Impact

• Introduce an interesting platform to study the higher function
of the brain (the frontal cortical area and the motor area) in
decision and control using designed control tasks
• Use systems tools (ADP, MDP, CI…) to understand some
fundamental science questions
• Need to develop new tools: technology centered designs and
theory centered analysis
• Inspire new ways of thinking about complex systems


Background on cortical motor control

• Center-out task and preferred direction
• Population coding of movement direction and speed
• Motor cortical neural activity as a predictive signal, preceding
movement onset
• Brain-machine interface: open loop vs. close loop solution


Cortical neural signal extraction: non-invasive vs. invasive recording

• EEG
– Rhythms β and μ, P300, Slow cortical potential (SCP)
– Sampling rate 200-1000Hz,
– # of channels, from 1 or 2 to 128 or 256
• Electrodes
– Bioactive, allowing growth of nerve, or bio-inactive multiple
mircowires or multichannel electrode arrays
– Superficial motor areas or deep brain structures
– Primary motor, parietal, premotor, frontoparietal, basal ganglia


Cortical neural signal extraction: ECoG

electrodes for online control are circled
spectral correlations of ECoG with target
location (color encodes patients)

resting

imagining saying
the word ‘move’
(d) Imagery is associated
with decrease in µ (8–12 Hz)
and β (18–26 Hz) bands.
A brain–computer interface using
electrocorticographic signals in humans*
Leuthardt et al 2004 J. Neural Eng. 1 63-71


•Motor and Thalamic Regions
•Used large number (40-60) of neurons
•Regress the position of a water dripper arm
•Used recurrent Neural Network

Chapin, J.K.; Moxon, K.A.; Markowitz, R.S.; and Nicolelis, M.A.L. (1999) Real-time control of a
robot arm using simultaneously recorded neurons in the motor cortex. Nature Neurosci.,
2:664-670.


a, b, Trial examples showing the movement by hand (green) and by neural reconstruction (blue) of a cursor
to a target (red). Dotted outlines represent the actual circumference of the target and cursor on the screen.
In a, hand motion resembles the neurally controlled cursor path; in b, no manipulandum motion occurred,
but the neurally controlled cursor reached the target. Each dot represents an estimate of position, updated
at 50-ms intervals. Axes are in x, y screen coordinates (1,000 units corresponds to a visual angle of 3.5°);
note that the two trials take place in different parts of the workspace.

• SERRUYA, HATSOPOULOS, PANINSKI, FELLOWS & DONOGHUE. Instant
neural control of a movement signal, NATURE 416 (6877): 141-142 MAR
14 2002
– Monkey, Utah array, motor cortex,
– 2D cursor position and velocity, Linear and Kalman Filters,
– a few (7–30) MI neurons
– careful calibration can lead to reasonable control without excessive training


• Taylor, Dawn M., Tillery, Stephen I. Helms, Schwartz, Andrew B.,Direct Cortical
Control of 3D Neuroprosthetic Devices, Science 2002 296: 1829-1832
– Monkey, microwire, motor and pre-motor cortex
– 3D cursor velocity, adaptive version of Population Vectors
– Showed small numbers of neurons can be used to control a three dimensional cursor and
that neurons trained to control a cursor can control a real robot for feeding


• Carmena JM, Lebedev MA, Crist RE, et al., Learning to control a brain-
machine interface for reaching and grasping by primates, PLOS
BIOLOGY 1 (2): 193-208 NOV 2003
– Monkey,
– high density array of 128 microwires, Motor, Premotor, Supplimentary Motor,
Posterior Parietal, and Sensory Cortex
– 2D cursor position and velocity and gripping force, Linear Filters


- Parietal reach region
(PRR)
- Cognition-based
prosthetic goal rather
than trajectory
- Performance improved
over a period of weeks.
- Expected value signals
related to fluid
preference, the
expected magnitude, or
probability of reward
were decoded
simultaneously with the
intended goal.
Musallam, S., Corneil, B. D.,
Greger, B., Scherberger,
H., and Andersen, R. A.
(2004). "Cognitive Control
Signals for Neural
Prosthetics", Science, Vol
305, Issue 5681, 258-262


Driving tasks

• The arena for training rats to drive the
robot towards one of the light


Question asked

• How does the rat develop a control strategy to complete
the driving tasks (under different time scale and spatial
complexity)?


Neuroscientific evidence

• Multimodal association area - anterior association area
(prefrontal cortex) integrating different sensory modalities and
linking them to action
• Macaque and rat prefrontal cortex receives multimodal
cortico-cortical projections from motor, somatosensory, visual,
auditory, gustatory, and limbic cortices
• Prefrontal areas provide cognitive, sensory or motivational
inputs for motor behavior (rastral region in rat)
• Motor areas are concerned with more concrete aspects of
movement (caudal region in rat)


One step at a time…

First, a directional control task with only high level control commands


The Brain-Controlled Vehicle

Neural Interface Signal Processing
Neural Signals
Algorithms/Command Extraction

Directiona
l control Control
Command

Vehicle State Signal
Environmental Feedback

Vehicle

Sensors


Goals

• To decode the directional control decision as a predictive
signal from motor cortical neural activities
• To associate motor neural activities with motor behavior and
thus to develop models to possibly interpret neural mechanism
of cortical motor directional control


• male Sprague-Dawley rats
• 2×4 arrays of 50µm tungsten wires coated with
polyimide
• spaced 500µm apart for a size of approximately
1.5mm×0.5mm.
• The implant site targets the rostral region

From Kolbe The Cerebral Cortex of the Rat, 1990

Brain Control Diagram

Feedback - Visual,
Neural Signals Auditory & Reward

Task
Recording Execution
System

NAV - K × L dimensional − 1, Left
vector 
+ 1, Right
Neuron 1 ··· Neuron L
Bin 1 ... K Bin 1 ... K Computation
Binned
of Directional
Spike times Data Neural Activity Vector Decision
Control
(NAV)
Decision


Perievent Histograms
Rdar36
Left Hits Right Hits
sig001a sig005a sig001a sig005a
200 40 200 40
100 20 100 20
0 0 0 0
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
sig002a sig005b sig002a sig005b
60 120 30
40 20
80 20
20 10 40 10
0 0 0 0
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
sig003a sig006a sig003a sig006a
80 80
40 40
40 40
counts/bin

0 0 0 0
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
sig003b sig007a sig003b sig007a
80 60
80 80
40 40
40 20 40
0 0 0 0
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
sig004a sig007b sig004a sig007b
120 150 8
80 4 100 4
40 50
0 0 0 0
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
sig004b sig008a sig004b sig008a
40
80 40 80
20
0 0 0 0
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
Time (sec) Time (sec) Time (sec) Time (sec)


Cross validation accuracy boxplots for manual and brain
control respectively, 5 rats, 8 data sets
C a c r c , C liba na dBa c n o a n uo s
V c ua y a r tio n r in o tr l, ll e r n
1

• Each box shows the
25-75 quartile, 0.9

median values of
accuracy. 0.8

• R3, R5/1, R5/2,
V c ua y
C Ac r c
there are fewer than 0.7

30 trials in each
brain control data 0.6
set. C lib2 /7
a 5 5
C libm d n
a e ia
0.5 Ba 2 /7
r in 5 5
Ba m d n
r in e ia

0.4
R1 R2 R3 R /1
4 R /2
4 R /3
4 R /1
5 R /2
5
R t/D y
a a

Typically 20 runs of randomized 5 fold cross-
validation were performed for each data set.


Modeling rat’s directional control using MDP?

MDPs:

Finite state space S = {1,2,  , n}
{
Finite action space A i = a1i , a2i ,  , ami }
Infinite decision horizon T = { 0,1,2,3, }
Cost function c(i, a ) discount factor γ (0 < γ < 1)
Action mapping a : S → Ai a(i ) ∈ A i
Stationary controller policy π = (a, a, ) π ∈ Π s


Manual lever press following cue
Brain control - “imaginary lever press” following cue


Possible implementation
Define 6 possible states:
• Idle – between two trials
• Ready – right before trial start
• Reward – success of a trial
• No-Reward – failure of a trial
• Left experiment state – left cue experiment
• Right experiment state – right cue experiment

The action (control) is the rat’s volition represented by corresponding neural activities

Going from one state to another depends on the current state as well as the action
taken.

• The reward can be stated as
r (LL) = 1; r(LR)=-1 …
r (RR) = 1; r(RL)=-1 …


Does this tell us more?

• “Open loop” discrimination and CV analysis provide a baseline
of relating neural activity (spike trains) to behavioral
parameters (left/right decision)
• As a decoding tool, can an MDP model tell us more than “open
loop” analysis?
• MDP model to explain the experiment as a decision process


Technicalities

• How to represent control (start/stop and bin size)
Trial and error, hard to formulate theoretically

• How to compute the transition matrix given uncertainty, partially
observed sequences of spike trains
We can try to formulate this theoretically…


• Uncertain transition matrices
– Robust value iteration (Nilim & El Ghaoui, 2005)
– Robust policy iteration (Satia & Lave, 1973)


Problem formulation

• Classification of uncertain transition matrices
– Expression of uncertain transition matrices

 P a11   f1a11 (U)   P a (1)   f1a (1) ( U ) 
 1    π  1   
 M   M  P = M = M 
 a   a ji   P a( n )   f a( n ) ( U) 
P =  Pi ji  =  fi (U)   n   n 
 M   M 
   
 P amn   f (U)  P = { P : U ∈ U }
amn
 n   1 


Problem formulation
• Classification of uncertain transition matrices
– Definition of uncertain transition matrices

The transition matrix P is correlated if y
a a a
P ⊂ P1 11 × × Pi ji × × P1 mn

[
The transition matrix P is independent if
a a a I1 S1
S2
P = P1 11 × × Pi ji × × P1 mn
a
Pi ji is the projection of P on the direction

]
a ji
of Pi (i ∈ S a ji ∈ A i )
P π is the projection of P on the direction I2
[ ]
of { P 1
a (1)
,P2
a (2)
, , P
n
a( n)
} x
S1 = I1 × I 2 S 2 ⊂ I1 × I 2


Problem formulation
• Classification of MDPs
– MDPs with independent transition matrices
– MDPs with correlated transition matrices

• Optimality criterion
– Minimizing maximum value function for any initial state

π
min max vP (i ) = v* (i ) ∀i ∈ S
π ∈Π s P∈P

• Stationary optimal policy pair
(π *
, P * ) is optimal if
π* π* π
v (i ) = max v (i ) = min max v P (i ) for any initial state i ∈ S
P* P
P∈P π ∈Π s P∈P


Problem formulation

• MDPs with independent transition matrices
– An optimal policy pair exists
– Robust value iteration and robust policy iteration are applicable

• MDPs with correlated transition matrices
– An optimal policy pair exists and both iterations are applicable
– An optimal policy pair exists but both iterations are no longer
applicable
– An optimal policy pair does not exist


Questions to be answered

• Sufficient conditions to guarantee that robust value iteration and
robust policy iteration are applicable;

• Optimality criterion to make a stationary optimal policy
pair exist in a weak condition;

• Efficient algorithm.


Sufficient conditions
Lemma
For any given π = (a, a,) ∈ Π s and any given q ∈ ℜ1×n ,
+

n×1
v∈ℜ
( )
max qv : v (i ) ≤ g π (v) := c ( i, a(i ) ) + γ amax( i ) Pi a (i ) v
i (i ) a
Pi ∈Pi
i∈S (1)

For any given q ∈ℜ1×n ,
+

max qv : v (i ) ≤ ( g (v) ) i := min  c ( i, a ) + γ max Pi a v 
  i∈S (2)
v∈ℜn×1 a∈ A i  Pi a ∈Pi a 
The functions g π and g are monotone non - decreasing and contractive.
The problems (1) and (2) have the unique optimal solutions denoted as
π
v∞ and v∞ , which are the unique solutions to the fixed - point equations
v = g π (v ) and v = g (v), respectively.
The optimal transition probility rows are given by

( ) { }
*
π
Pi a ( i ) ∈ arg amax( i ) Pi a ( i ) v∞
(i) a
i ∈ S , which constitute ( Pπ )* (3)
Pi ∈Pi

( ) { }
*
Pi a ∈ arg max Pi a v∞ i ∈ S , a ∈ A i , which constitute ( P)*
a a
(4)
Pi ∈Pi



π
Iterations for obtaining v∞
π
(1) select v0 ∈ℜn×1 and set k = 0;
(2) compute vk +1 by vk +1 = g π (vk )
π π π

π π π π
(3) terminate if vk +1 = vk and output v∞ = vk ;
otherwise, set k = k + 1 and go to (2)

Iterations for obtaining v∞
(1) select v0 ∈ ℜn×1 and set k = 0;
(2) compute vk +1 by vk +1 = g (vk )
(3) terminate if vk +1 = vk and output v∞ = vk ;
otherwise, set k = k + 1 and go to (2)



Theorem
When there exist, for any π ∈ Π s , ( Pπ )*
defined by (3) is in the set P π , and P*
defined by (4) is in the set P
i) A stationary optimal policy pair exists
under the optimality criterion of
minimizing maximum value function
for any initial state
ii) Robust value iteration is applicable;
iii) Robust policy iteration is applicable.


Robust value iteration

1. Select v0 ∈ℜn and set k = 0;
2. Compute vk +1 by

vk +1 (i ) = min  c(i, a ) + γ max Pi a vk 
 
a∈ A i  Pi a ∈Pi a 
3. If vk +1 = vk , then go to 4; otherwise increment k by 1 and go to 2
4. Compute π * = (a* , a* ,) and P* defined by

a* (i ) ∈ arg min  c(i, a ) + γ max Pi a vk 
 
a∈A i  Pi a ∈Pi a 
( )
a
P* ∈ arg max{Pi a vk }
i a a
Pi ∈Pi

5. If P* ∈ P, output a stationary optimal policy pair (π * , P* );
otherwise, the algorithm can not be applied.


Robust policy iteration

1. Initialization : select π 0 = ( a0 , a0 ,) ∈ Π s and set k = 0;
π
2. Policy evaluation : do iteration for v∞k ;
3. Policy improvement : find πk +1 = (ak +1 , ak +1 ,)

ak +1 (i ) ∈ arg min  c(i, a ) + γ max Pi a v∞k 

π

a∈ A i  Pi a ∈Pi a 
4. If ππP = k , compute * by
k +1

(P )
a π
*
∈ arg max{Pi a v∞k } ∀i ∈ S a ∈ A i
i a a
Pi ∈Pi

and go to 5; otherwise increment k by 1 and go to 2;
5. If P* ∈ P, output a stationary optimal policy pair (π * , P* );
otherwise, the algorithm can not be applied.



Example
S = { 1, 2} A1 = A 2 = { a1 , a2 }
 P a1   u1
1
1 − u1  c(1, a1 ) = 1
 a2   
P   u3 1 − u3  c(1, a2 ) = 2
P =  1a =
 P2 1  1 − u2
2
u2 
2
c(2, a1 ) = 3
 a2  
 P  1− u 
 2   4 u4  c(2, a2 ) = 4
U = { u1 , u2 , u3 , u4 } W = { 0, 0.2, 0.4, 0.6, 0.8,1}
U = { U : u1 = u3 , u2 = u4 ; u1 , u4 ∈ W} ⇒ Correlated transition matrix P
Independent transition matrix for π , Pπ
Optimal controller policy π * = a* , a* ,( ) a* (1) = a1 a* (2) = a1
0 1
 
0 1
Optimal nature policy P = 
*
∈P
0 1
 
0 1


New optimality criterion

• Minimizing maximum squared total value function
π 2
min max V P (5)
π ∈Π s P∈P

′
Where total value function V π
P = (V ) V π
P
π
P

′
π
( π π
V = v (1)  v (i )  v (n)
P P P
π
P )
• Stationary optimal policy pair

(π )
2 2
π* π* π 2
* *
, P is optimal if V P*
= max V P = min max V P
P∈P π ∈Π s P∈P


New optimality criterion

• Existence of stationary optimal policy pair
Theorem :
2
Assuming for any π , max VPπ exists, a stationary optimal
P∈P

policy pair (π * , P* ) exists in terms of (5)

• Relationship between two optimality criterions
Optimality criterion of minimizing maximum squared total value
function generalizes optimality criterion of minimizing maximum
value function for any initial state


Robust policy iteration under total value function

• Policy evaluation
– Direct method
−1
′ ′
= max ( C )  I − γ ( P π ) 
π 2
( I − γ ( P ))
π π −1
max V P   Cπ
P∈P P∈P  
– Iterative method
π
Iteration for v∞
π * Π 3 Π 2 Π1 Π 0
• Policy improvement
– Policy improvement in robust policy iteration
a k +1 (i ) ∈ arg min c(i, a ) + γ max Pi a vk 
 
a∈A i  Pi a ∈Pi a 
– Controller policy elimination
π 2 πk 2
Necessary condition for optimal policy at k-th iteration V Pπ k
≤V Pπ k


1. Initialization : set k = 0, Π 0 = Π s , M = +∞ and select π 0 = { a0 , a0 ,}
2. Policy evaluation :
If the condition of iteration for π k is satisfied
2 2 2
(a) use "iterative method" to compute Pπ k ∈ P and VPππkk such that VPππkk = max VPπ k
P∈P

Else
(b) use "direct method"
3. Policy improvement :
(a) eliminate controller policies
Algorithm of robust policy
iteration under total value function
{
Π′ = π ∈ Π k : VPπ k
k
π 2 π
≤ VPπkk
2
}
If Π ′ > 1
k

If the condition in Theorem is satisfied
2
(b) Set Π k +1 = Π′ and M = VPπkk
k
π
and select π k +1 = { a k +1 , ak +1 ,} ∈ Π k +1 by

a∈A i {
a k +1 (i ) ∈ arg min c(i, a) + γ max Pi a vk
a a
Pi ∈Pi
}
If π k +1 = π k , go to 4; otherwise, set k = k + 1 and go to 2
Else
2 2
(c) If VPππkk < M , set M = VPππkk and Π k +1 = Π ′, and then select π k +1 ≠ π k ∈ Π k +1
and set k = k + 1 and go to 2; otherwise, select π k ∈ Π′ − { π k } and set
′ k

Π k = Π′ − { π k } and π k = π k and go to 2
k
′
Else
(d) go to 4 si@asu.edu NSF ADP 2006
4. Termination : output (π k , Pπ k ) as a stationary optimal policy pair

Remaining issues toward MDP model of the rat’s neural control strategy

How to estimate uncertain stationary transition matrices in Markov decision
processes using the experimental data collected from the rat’s cortical motor
areas while he performed his control tasks?

Proposed Solution:
D-S theory of evidence is proposed as new models for obtaining set estimation of
stationary transition matrix

Mathematics worked out, need to implement with algorithms and compare with
existing models

Is a POMDP model more feasible? How?

More work needed to give the rat’s cortical neural control mechanism a
reasonable mathematical model


Acknowledgement

• Support by NSF under ECS-0002098 and ECS-0233529, and partially by
General Dynamics
• Support by ASU infrastructural funds
• Byron Olson and Jing Hu for work on rat experiment and analysis
• Baohua Li for robust dynamic programming results
• Jiping He for help with experiments
• Useful discussions with many (Dankert, L. Yang, C. Yang, Raghunathan …)
• Lab support by many (Silver, Scanlan, Tian…)


Jennie sinsfadp06

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (10)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Jennie sinsfadp06

Ähnlich wie Jennie sinsfadp06 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Jennie sinsfadp06

Hinweis der Redaktion