difficult and engaging for human players. We used the same network
architecture, hyperparameter values (see Extended Data Table 1) and
learningprocedurethroughout—takinghigh-dimensionaldata(210|160
colour video at 60 Hz) as input—to demonstrate that our approach
robustly learns successful policies over a variety of games based solely
onsensoryinputswithonlyveryminimalpriorknowledge(thatis,merely
the input data were visual images, and the number of actions available
in each game, but not their correspondences; see Methods). Notably,
our method was able to train large neural networks using a reinforce-
mentlearningsignalandstochasticgradientdescentinastablemanner—
illustrated by the temporal evolution of two indices of learning (the
agent’s average score-per-episode and average predicted Q-values; see
Fig. 2 and Supplementary Discussion for details).
We compared DQN with the best performing methods from the
reinforcement learning literature on the 49 games where results were
available12,15
. In addition to the learned agents, we alsoreport scores for
aprofessionalhumangamestesterplayingundercontrolledconditions
and a policy that selects actions uniformly at random (Extended Data
Table 2 and Fig. 3, denoted by 100% (human) and 0% (random) on y
axis; see Methods). Our DQN method outperforms the best existing
reinforcement learning methods on 43 of the games without incorpo-
rating any of the additional prior knowledge about Atari 2600 games
used by other approaches (for example, refs 12, 15). Furthermore, our
DQN agent performed at a level that was comparable to that of a pro-
fessionalhumangamestesteracrossthesetof49games,achievingmore
than75%ofthe humanscore onmorethanhalfofthegames(29 games;
Convolution Convolution Fully connected Fully connected
No input
Figure 1 | Schematic illustration of the convolutional neural network. The
details of the architecture are explained in the Methods. The input to the neural
network consists of an 843 843 4 image produced by the preprocessing
map w, followed by three convolutional layers (note: snaking blue line
symbolizes sliding of each filter across input image) and two fully connected
layers with a single output for each valid action. Each hidden layer is followed
by a rectifier nonlinearity (that is, max 0,xð Þ).
a b
c d
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
2,000
2,200
0 20 40 60 80 100 120 140 160 180 200
Averagescoreperepisode
Training epochs
8
9
10
11
alue(Q)
0
1,000
2,000
3,000
4,000
5,000
6,000
0 20 40 60 80 100 120 140 160 180 200
Averagescoreperepisode
Training epochs
7
8
9
10
alue(Q)
RESEARCH LETTER
IEEE ROBOTICS & AUTOMATION MAGAZINE MARCH 2016104
regression [2]. With a function approximator, the sampled
data from the approximated model can be generated by inap-
propriate interpolation or extrapolation that improperly up-
dates the policy parameters. In addition, if we aggressively
derive the analytical gradi-
ent of the approximated
model to update the poli-
cy, the approximated gra-
dient might be far from
the true gradient of the
objective function due to
the model approximation
error. If we consider using
these function approxima-
tion methods for high-di-
mensional systems like
humanoid robots, this
problem becomes more
serious due to the difficul-
ty of approximating high-
dimensional dynamics models with a limited amount of data
sampled from real systems. On the other hand, if the environ-
ment is extremely stochastic, a limited amount of previously
acquired data might not be able to capture the real environ-
ment’s property and could lead to inappropriate policy up-
dates. However, rigid dynamics models, such as a humanoid
robot model, do not usually include large stochasticity. There-
fore, our approach is suitable for a real robot learning for high-
dimensional systems like humanoid robots.
Moreover, applying RL to actual robot control is difficult,
since it usually requires many learning trials that cannot be exe-
cuted in real environments, and the real system’s durability is
limited. Previous studies used prior knowledge or properly de-
signed initial trajectories to apply RL to a real robot and im-
proved the robot controller’s parameters [1], [4], [10], [19], [32].
We applied our proposed learning method to our human-
oid robot [7] (Figure 13) and show that it can accomplish two
different movement-learning tasks without any prior knowl-
edge for the cart-pole swing-up task or with a very simple
nominal trajectory for the basketball-shooting task.
The proposed recursive use of previously sampled data to
improve policies for real robots would also be useful for
other policy search algorithms, such as reward weighted re-
gression [11] or information theoretic approaches [12], and it
might be interesting to investigate how these combinations
work as a future study.
Conclusions
In this article, we proposed reusing the previous experienc-
es of a humanoid robot to efficiently improve its task per-
formance. We proposed recursively using the off-policy
PGPE method to improve the policies and applied our ap-
proach to cart-pole swing-up and basketball-shooting
tasks. In the former, we introduced a real-virtual hybrid
task environment composed of a motion controller and vir-
tually simulated cart-pole dynamics. By using the hybrid
environment, we can potentially design a wide variety of
different task environments. Note that complicated arm
movements of the humanoid robot need to be learned for
the cart-pole swing-up. Furthermore, by using our pro-
posed method, the challenging basketball-shooting task
was successfully accomplished.
Future work will develop a method based on a transfer
learning [28] approach to efficiently reuse the previous expe-
riences acquired in different target tasks.
Acknowledgment
This work was supported by MEXT KAKENHI Grant
23120004, MIC-SCOPE, ``Development of BMI Technolo-
gies for Clinical Application’’ carried out under SRPBS by
AMED, and NEDO. Part of this study was supported by JSPS
KAKENHI Grant 26730141. This work was also supported by
NSFC 61502339.
References
[1] A. G. Kupcsik, M. P. Deisenroth, J. Peters, and G. Neumann, “Data-effi-
cient contextual policy search for robot movement skills,” in Proc. National
Conf. Artificial Intelligence, 2013.
[2] C. E. Rasmussen and C. K. I. Williams Gaussian Processes for Machine
Learning. Cambridge, MA: MIT Press, 2006.
[3] C. G. Atkeson and S. Schaal, “Robot learning from demonstration,” in Proc.
14th Int. Conf. Machine Learning, 1997, pp. 12–20.
[4] C. G. Atkeson and J. Morimoto, “Nonparametric representation of poli-
cies and value functions: A trajectory-based approach,” in Proc. Neural Infor-
mation Processing Systems, 2002, pp. 1643–1650.
Efficiently reusing previous
experiences is crucial to
improve its behavioral
policies without actually
interacting with real
environments.
Figure 13. The humanoid robot CB-i [7]. (Photo courtesy of ATR.)
Learning Dexterous In-Hand Manipulation
OpenAI⇤
ON FOR A REAL ROBOT 281
(b) A picture of the radio-controlled vehi-
cle.
meters such as the focal length and tilt angle, or the
b) shows a picture of the real robot with a TV camera
eriments.
stem, we will briefly review the basics of Q-learning.
We follow the explanation of Q-learning by Kaelbling
ation as a Reinforcement Learning Problem
learning framework we introduced above, where the goal is to find the update
at minimizes the meta-loss. Intuitively, we think of the agent as an optimization
and the environment as being characterized by the family of objective functions
ike to learn an optimizer for. The state consists of the current iterate and some
ong the optimization trajectory so far, which could be some statistic of the history
s, iterates and objective values. The action is the step vector that is used to update
formulation, the policy is essentially a procedure that computes the action, which
vector, from the state, which depends on the current iterate and the history of
iterates and objective values. In other words, a particular policy represents a
update formula. Hence, learning the policy is equivalent to learning the update
nd hence the optimization algorithm. The initial state probability distribution is
stribution of the initial iterate, gradient and objective value. The state transition
distribution characterizes what the next state is likely to be given the current
action. Since the state contains the gradient and objective value, the state
probability distribution captures how the gradient and objective value are likely to
any given step vector. In other words, it encodes the likely local geometries of the
unctions of interest. Crucially, the reinforcement learning algorithm does not
access to this state transition probability distribution, and therefore the policy it
ds overfitting to the geometry of the training objective functions.
a cost function of a state to be the value of the objective function evaluated at the
ate. Because reinforcement learning minimizes the cumulative cost over all time
sentially minimizes the sum of objective values over all iterations, which is the
e meta-loss.
an optimization algorithm on the problem of training a neural net on MNIST, and
n the problems of training di"erent neural nets on the Toronto Faces Dataset
AR-10 and CIFAR-100. These datasets bear little similarity to each other: MNIST
black-and-white images of handwritten digits, TFD consists of grayscale images
aces, and CIFAR-10/100 consists of colour images of common objects in natural
d using our approach on MNIST (shown in light
FAR-100 and outperforms other optimization
on algorithms learned using our approach, we
dimensional logistic regression problems and
rameters. It is worth noting that the behaviours
s and high dimensions may be di"erent, and so
ve of the behaviours of optimization algorithms
e some useful intuitions about the kinds of
ctories followed by various algorithms on two
ms. Each arrow represents one iteration of an
thm learned using our approach (shown in light
IBM Research / Center for Business Optimization
Modeling and Optimization
Engine
Actions
Other
System 1
System 2
System 3
Event Listener
Event
Notification
Event
Notification
Event
Notification
< inserts >
TP Profile
Taxpayer State
( Current )
Modeler
Optimizer
< input to >
State Generator
< input to >
Case Inventory
< reads >
< input to >
Allocation Rules
Resource
Constraints
< input to >
< inserts , updates >
Business Rules
< input to >
< generates >
Segment Selector Action
1
Cnt Action
2
Cnt Action
n
Cnt
1 C
1
^ C
2
V C
3
200 50 0
2 C
4
V C
1
^ C
7
0 50 250
TP ID Feat
1
Feat
2
Feat
n
123456789 00 5 A 1500
122334456 01 0 G 1600
122118811 03 9 G 1700
Rule Processor
< input to >
< input to >
Recommended
Actions
< inserts , updates >
TP ID Rec. Date Rec. Action Start Date
123456789 00 6/21/2006 A1 6/21/2006
122334456 01 6/20/2006 A2 6/20/2006
122118811 03 5/31/2006 A2
Action Handler
< input to >
New
Case
Case Extract
Scheduler
< starts > < updates >
State
Time Expired
Event
Notification
< input to >
Taxpayer State
History
State
TP ID State Date Feat
1
Feat
2
Feat
n
123456789 00 6/1/2006 5 A 1500
122334456 01 5/31/2006 0 G 1600
122118811 03 4/16/2006 4 R 922
122118811 03 4/20/2006 9 G 1700
< inserts >
Feature Definitions
(XML)
(XSLT)
(XML)
(XML)
(XSLT)
Figure 2: Overall collections system architecture.
3.1 GENERATE MODEL DESCRIPTIONS WITH A CONTROLLER RECURRENT NEURAL
NETWORK
In Neural Architecture Search, we use a controller to generate architectural hyperparameters of
neural networks. To be flexible, the controller is implemented as a recurrent neural network. Let’s
suppose we would like to predict feedforward neural networks with only convolutional layers, we
can use the controller to generate their hyperparameters as a sequence of tokens:
Figure 2: How our controller recurrent neural network samples a simple convolutional network. It
predicts filter height, filter width, stride height, stride width, and number of filters for one layer and
repeats. Every prediction is carried out by a softmax classifier and then fed into the next time step
as input.
In our experiments, the process of generating an architecture stops if the number of layers exceeds
a certain value. This value follows a schedule where we increase it as training progresses. Once the
controller RNN finishes generating an architecture, a neural network with this architecture is built
and trained. At convergence, the accuracy of the network on a held-out validation set is recorded.
The parameters of the controller RNN, ✓c, are then optimized in order to maximize the expected
validation accuracy of the proposed architectures. In the next section, we will describe a policy
gradient method which we use to update parameters ✓c so that the controller RNN generates better
architectures over time.
3.2 TRAINING WITH REINFORCE
The list of tokens that the controller predicts can be viewed as a list of actions a1:T to design an
architecture for a child network. At convergence, this child network will achieve an accuracy R on
a held-out dataset. We can use this accuracy R as the reward signal and use reinforcement learning
The last few years have seen much success of deep neural networks in m
cations, such as speech recognition (Hinton et al., 2012), image recognitio
Krizhevsky et al., 2012) and machine translation (Sutskever et al., 2014; Bah
et al., 2016). Along with this success is a paradigm shift from feature de
designing, i.e., from SIFT (Lowe, 1999), and HOG (Dalal & Triggs, 2005), to
et al., 2012), VGGNet (Simonyan & Zisserman, 2014), GoogleNet (Szeg
ResNet (He et al., 2016a). Although it has become easier, designing archit
lot of expert knowledge and takes ample time.
Figure 1: An overview of Neural Architecture Search.
This paper presents Neural Architecture Search, a gradient-based method for
tures (see Figure 1) . Our work is based on the observation that the structure
⇤
Work done as a member of the Google Brain Residency program (g.co/brain
1
arXiv:1611.0
omplementary roles of basal ganglia and cerebellum in learning and motor control Doya 733
Thalamus
Cerebral cortex
Cerebellum Target
Error
OutputInput
Supervised learning
Reward
OutputInput
Reinforcement learning
Unsupervised learning
OutputInput
Basal
ganglia
Current Opinion in Neurobiology
Inferior
olive
+
–
Substantia
nigra
•
–
–
–
V ⇡
(s) = E⇡,P
" 1X
t=0
t
R(st, at) |s0 = s
#
Q⇡
(s, a) = E⇡,P
" 1X
t=0
t
R(st, at) |s0 = s, a0 = a
#
<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
•
•
•
V ⇤
= max
⇡
V ⇡
, Q⇤
= max
⇡
Q⇡
<latexit sha1_base64="0v33v5Je9NLzYntL8CHM8lep3rI=">AAAIx3icpVW9b9NQEL+0fJjw0QQWRJeIqBVCVfRcCgQkpAoW2JqWpJXqENnOS2rFsY0/QoJlJAYWdsTABBID4s9g4R9g6J+AOhaJhYF7LyaJk9S2hC377t27+93Pd2dbsXTNcQk5zCwsnjp95qxwLnv+wsVLS7n85ZpjerZKq6qpm/aeIjtU1wxadTVXp3uWTeWuotNdpfOI7e/2qO1opvHUHVi03pXbhtbSVNlFUyP3pvbMl1Tb8m8GQWH1QUHyjCa6U9eXTFs22tSXLC0IfKkr94PCqwJzn9zIrknPPbmZrUzAJKNUplAauSIpEX4UZhUxVIoQHltmfvEdSNAEE1TwoAsUDHBR10EGB899EIGAhbY6+GizUdP4PoUAshjroRdFDxmtHby3cbUfWg1cM0yHR6uYRcfLxsgCrJAf5As5Jt/JV/KT/DkRy+cYjMsApTKMpVZj6e3Vnd+JUV2ULhyMo2I5u9CCMueqIXeLW9hTqMP43sv3xzv3t1f8VfKJHCH/j+SQfMMnMHq/1M8Vuv0hho+CXIYVM3D9glejy/kZWH8f7Tbam+jDal7nFpa9z/vxr3L+yMuHIvoFCZgKxnm8V/GoY790uG3Og9UmCXnSM4rdRGuLV5pGItidZWqHfFTMPMBKMp3AGl4luI1SRDlcx3ON4sWznc6dhm+TvxedSA0nOYsh5ztcJvOdxYvnPC9/mh6qKJX/Rk+qynheo30swXqkh+nrku49mc2fhjV7522863Nnj81dOezmvdEspuEexY1nPs0hDW8ZtQP+zVP4BM/OYGnE9xZsjCaynIL7LHY8/3lc0jzDsFNJHSCRqZmUSdPOUNN9XU/oAP5dxel/6axSWy+JpCRWNoqbD8P/rADLcB1uIMpd2ITHsAVVzHqUyWeuZZaFJ4Ip9IT+0HUhE8ZcgcghvP4Lf8DdKg==</latexit><latexit sha1_base64="0v33v5Je9NLzYntL8CHM8lep3rI=">AAAIx3icpVW9b9NQEL+0fJjw0QQWRJeIqBVCVfRcCgQkpAoW2JqWpJXqENnOS2rFsY0/QoJlJAYWdsTABBID4s9g4R9g6J+AOhaJhYF7LyaJk9S2hC377t27+93Pd2dbsXTNcQk5zCwsnjp95qxwLnv+wsVLS7n85ZpjerZKq6qpm/aeIjtU1wxadTVXp3uWTeWuotNdpfOI7e/2qO1opvHUHVi03pXbhtbSVNlFUyP3pvbMl1Tb8m8GQWH1QUHyjCa6U9eXTFs22tSXLC0IfKkr94PCqwJzn9zIrknPPbmZrUzAJKNUplAauSIpEX4UZhUxVIoQHltmfvEdSNAEE1TwoAsUDHBR10EGB899EIGAhbY6+GizUdP4PoUAshjroRdFDxmtHby3cbUfWg1cM0yHR6uYRcfLxsgCrJAf5As5Jt/JV/KT/DkRy+cYjMsApTKMpVZj6e3Vnd+JUV2ULhyMo2I5u9CCMueqIXeLW9hTqMP43sv3xzv3t1f8VfKJHCH/j+SQfMMnMHq/1M8Vuv0hho+CXIYVM3D9glejy/kZWH8f7Tbam+jDal7nFpa9z/vxr3L+yMuHIvoFCZgKxnm8V/GoY790uG3Og9UmCXnSM4rdRGuLV5pGItidZWqHfFTMPMBKMp3AGl4luI1SRDlcx3ON4sWznc6dhm+TvxedSA0nOYsh5ztcJvOdxYvnPC9/mh6qKJX/Rk+qynheo30swXqkh+nrku49mc2fhjV7522863Nnj81dOezmvdEspuEexY1nPs0hDW8ZtQP+zVP4BM/OYGnE9xZsjCaynIL7LHY8/3lc0jzDsFNJHSCRqZmUSdPOUNN9XU/oAP5dxel/6axSWy+JpCRWNoqbD8P/rADLcB1uIMpd2ITHsAVVzHqUyWeuZZaFJ4Ip9IT+0HUhE8ZcgcghvP4Lf8DdKg==</latexit><latexit sha1_base64="0v33v5Je9NLzYntL8CHM8lep3rI=">AAAIx3icpVW9b9NQEL+0fJjw0QQWRJeIqBVCVfRcCgQkpAoW2JqWpJXqENnOS2rFsY0/QoJlJAYWdsTABBID4s9g4R9g6J+AOhaJhYF7LyaJk9S2hC377t27+93Pd2dbsXTNcQk5zCwsnjp95qxwLnv+wsVLS7n85ZpjerZKq6qpm/aeIjtU1wxadTVXp3uWTeWuotNdpfOI7e/2qO1opvHUHVi03pXbhtbSVNlFUyP3pvbMl1Tb8m8GQWH1QUHyjCa6U9eXTFs22tSXLC0IfKkr94PCqwJzn9zIrknPPbmZrUzAJKNUplAauSIpEX4UZhUxVIoQHltmfvEdSNAEE1TwoAsUDHBR10EGB899EIGAhbY6+GizUdP4PoUAshjroRdFDxmtHby3cbUfWg1cM0yHR6uYRcfLxsgCrJAf5As5Jt/JV/KT/DkRy+cYjMsApTKMpVZj6e3Vnd+JUV2ULhyMo2I5u9CCMueqIXeLW9hTqMP43sv3xzv3t1f8VfKJHCH/j+SQfMMnMHq/1M8Vuv0hho+CXIYVM3D9glejy/kZWH8f7Tbam+jDal7nFpa9z/vxr3L+yMuHIvoFCZgKxnm8V/GoY790uG3Og9UmCXnSM4rdRGuLV5pGItidZWqHfFTMPMBKMp3AGl4luI1SRDlcx3ON4sWznc6dhm+TvxedSA0nOYsh5ztcJvOdxYvnPC9/mh6qKJX/Rk+qynheo30swXqkh+nrku49mc2fhjV7522863Nnj81dOezmvdEspuEexY1nPs0hDW8ZtQP+zVP4BM/OYGnE9xZsjCaynIL7LHY8/3lc0jzDsFNJHSCRqZmUSdPOUNN9XU/oAP5dxel/6axSWy+JpCRWNoqbD8P/rADLcB1uIMpd2ITHsAVVzHqUyWeuZZaFJ4Ip9IT+0HUhE8ZcgcghvP4Lf8DdKg==</latexit><latexit sha1_base64="+yUFvErgRhyN9ZPsRpgMcGD0ZDQ=">AAAIMnicpVXNbtNAEJ62FJdQaHtD4lIRFXFA1rj8BU5IvXDsD2krlaiynU1qxX+ynUCw+gK9VogDJ5A4IJ4CceEFOPQJEEJCSEXiwoHZtZvEcWqvhKN4Zscz33yemfUavm2FEeLJ1PTMhdmLytylyuX5ypWrC4vz26HXDUxWNz3bC3YNPWS25bJ6ZEU22/UDpjuGzXaMzhp/vtNjQWh57tOo77OGo7ddq2WZekSm9f3FKqooruW8oqVKFdLLW5p5Bc+gCR6Y0AUHGLgQkW6DDiH99kADBJ9sDYjJFpBmiecMDqFCsV3yYuShk7VD9zat9lKrS2uOGYpok7LY9A8ochlW8Ct+wFP8gh/xO/49FysWGJxLn6SRxDJ/f+Ho2taf0iiHZAQHw6hCzhG0oCa4WsTdFxb+FmYS33v5+nTr0eZKfBPf4Q/i/xZP8DO9gdv7bb7fYJtvCvgYxCWpmEvr56IajuDnUv1jsgdkb5IPr3lDWHj2F6IfZ5WLB14xVMnvsATToLiu6FUx6tBPDrctePDalCGPemaxm2RtiUqzTAS/80ztlI9JmftUSa4j3Ka/CvdIaiSTdTHXLF4x2/HcMnybYl90MjUc5aylnO8LWc43j1fMeVJ+mR6aJI3/Ri+rynBes31UYTXTQ/m6yO2TfH4Z1nzPB3S3J84en7ta2s2Hg1mU4Z7FLWY+zkGGt07agfjmGWKC8zOoDvjegbuDiaxJcM9jF/OfxEXmHZJOlXUAM1MzKsumnaPKfV3P6QAdrtr4UZpXtldVDVVtA2EOrsMNuEXBD+AxPIF1qItNdwTHyiflm/IzOYSnp9LTeAkyl/LrHw59qUs=</latexit><latexit sha1_base64="8+EXbOQXyVjN0rL5PuUUZddx8EE=">AAAIvHicpVVLb9NAEJ60PEwoNOGE6CUiaoVQFa3LKyAhIXGBW9OStFIdIj82qRW/cOyQYJkbF+6IAyeQOCB+Bhf+AIf+BNRjkbhwYHZjkjhJbUvYsmd2PPPN55mxV3EMvecRcpRbWj5z9tx54UL+4sqly6uF4kqjZ/uuSuuqbdjuviL3qKFbtO7pnkH3HZfKpmLQPaX7mD3f61O3p9vWM2/o0KYpdyy9rauyh6ZW4U3jeSCprhPcDMPSxsOS5FsaulMvkGxXtjo0kBw9DAPJlAdh6XWJuU8/yG9KL3xZy9emYNJRajMorUKZVAg/SvOKGClliI5tu7j8DiTQwAYVfDCBggUe6gbI0MPzAEQg4KCtCQHaXNR0/pxCCHmM9dGLooeM1i7eO7g6iKwWrhlmj0ermMXAy8XIEqyTH+QLOSHfyVfyk/w5FSvgGIzLEKUyiqVOa/Xt1d3fqVEmSg8OJ1GJnD1oQ5Vz1ZG7wy3sLdRRfP/V+5PdBzvrwQb5RI6R/0dyRL7hG1j9X+rnGt35kMBHQS6jilm4fsmrYXJ+FtY/QLuLdg19WM2b3MKyD3g//lUuGHsFUEa/MAVTwTif9yoZdeKXDbfDebDapCFPe8axNbS2eaVpLILdWaZOxEfFzEOsJNMJbOJVgTsoRZSjdTLXOF4y29ncWfhq/Lvoxmo4zVmMON/lMp3vPF4y50X5s/RQRan8N3paVSbzGu9jBbZiPcxel2zfyXz+LKzZN+/i3Vg4e2zuqlE3749nMQv3OG4y81kOWXjLqB3yf57CJ3h+Bitjvrfg9ngiqxm4z2Mn81/EJcs7jDqV1gESm5ppmTbtDDXb3/WUDuDuKs7upfNKY6sikopYIyDAGlyHGxh8Dx7BE9iGOiY7zhVz13JrwlPBFvqjfXgpF23IVyB2CIO/xfbb8Q==</latexit><latexit sha1_base64="8+EXbOQXyVjN0rL5PuUUZddx8EE=">AAAIvHicpVVLb9NAEJ60PEwoNOGE6CUiaoVQFa3LKyAhIXGBW9OStFIdIj82qRW/cOyQYJkbF+6IAyeQOCB+Bhf+AIf+BNRjkbhwYHZjkjhJbUvYsmd2PPPN55mxV3EMvecRcpRbWj5z9tx54UL+4sqly6uF4kqjZ/uuSuuqbdjuviL3qKFbtO7pnkH3HZfKpmLQPaX7mD3f61O3p9vWM2/o0KYpdyy9rauyh6ZW4U3jeSCprhPcDMPSxsOS5FsaulMvkGxXtjo0kBw9DAPJlAdh6XWJuU8/yG9KL3xZy9emYNJRajMorUKZVAg/SvOKGClliI5tu7j8DiTQwAYVfDCBggUe6gbI0MPzAEQg4KCtCQHaXNR0/pxCCHmM9dGLooeM1i7eO7g6iKwWrhlmj0ermMXAy8XIEqyTH+QLOSHfyVfyk/w5FSvgGIzLEKUyiqVOa/Xt1d3fqVEmSg8OJ1GJnD1oQ5Vz1ZG7wy3sLdRRfP/V+5PdBzvrwQb5RI6R/0dyRL7hG1j9X+rnGt35kMBHQS6jilm4fsmrYXJ+FtY/QLuLdg19WM2b3MKyD3g//lUuGHsFUEa/MAVTwTif9yoZdeKXDbfDebDapCFPe8axNbS2eaVpLILdWaZOxEfFzEOsJNMJbOJVgTsoRZSjdTLXOF4y29ncWfhq/Lvoxmo4zVmMON/lMp3vPF4y50X5s/RQRan8N3paVSbzGu9jBbZiPcxel2zfyXz+LKzZN+/i3Vg4e2zuqlE3749nMQv3OG4y81kOWXjLqB3yf57CJ3h+Bitjvrfg9ngiqxm4z2Mn81/EJcs7jDqV1gESm5ppmTbtDDXb3/WUDuDuKs7upfNKY6sikopYIyDAGlyHGxh8Dx7BE9iGOiY7zhVz13JrwlPBFvqjfXgpF23IVyB2CIO/xfbb8Q==</latexit><latexit sha1_base64="xmWIwGjZ4NJtPuqIJmTZSszXHz0=">AAAIx3icpVW9b9NQEL+0fJjw0QQWRBeLqBVCVfRcvgISUgULbE1L0kp1iGznJbXi2MZ2QoJlJAYWdsTABBID4s9g4R9g6J+AOhaJhYF7LyaJk9S2VFv23bt397uf7862ahu66xFykFlYPHX6zFnhXPb8hYuXlnL5y1XX6joarWiWYTm7quJSQzdpxdM9g+7aDlU6qkF31PZjtr/To46rW+Yzb2DTWkdpmXpT1xQPTfXc2+pzX9Yc278ZBOLqQ1Humg10p54vW45itqgv23oQ+HJH6Qfia5G5T25k1+QXXaWRLU/AJKOUp1DquQIpEn6Is4oUKgUIj00rv/geZGiABRp0oQMUTPBQN0ABF889kICAjbYa+GhzUNP5PoUAshjbRS+KHgpa23hv4WovtJq4Zpguj9Ywi4GXg5EirJCf5Cs5Ij/IN/KL/D0Wy+cYjMsApTqMpXZ96d3V7T+JUR2UHuyPo2I5e9CEEueqI3ebW9hTaMP43qsPR9sPtlb8VfKZHCL/T+SAfMcnMHu/tS9luvUxho+KXIYVM3H9klejw/mZWH8f7Q7aG+jDal7jFpa9z/vxv3L+yMuHAvoFCZgqxnV5r+JRx37pcFucB6tNEvKkZxS7gdYmrzSNRLA7y9QK+WiYeYCVZDqBNbyKcAelhHK4jucaxYtnO507Dd8Gfy/akRpOcpZCzne5TOY7ixfPeV7+ND3UUKonRk+qynheo30swnqkh+nrku49mc2fhjV75x28G3Nnj81dKezm/dEspuEexY1nPs0hDW8FtX3+zVP5BM/OYHHE9xbcHk1kKQX3Wex4/vO4pHmGYaeSOkAiUzMpk6adoab7uh7TAfy7StP/0lmlul6USFEqk8LGo/A/K8AyXIcbiHIPNuAJbEIFsx5m8plrmWXhqWAJPaE/dF3IhDFXIHIIb/4BfoDdJg==</latexit><latexit sha1_base64="0v33v5Je9NLzYntL8CHM8lep3rI=">AAAIx3icpVW9b9NQEL+0fJjw0QQWRJeIqBVCVfRcCgQkpAoW2JqWpJXqENnOS2rFsY0/QoJlJAYWdsTABBID4s9g4R9g6J+AOhaJhYF7LyaJk9S2hC377t27+93Pd2dbsXTNcQk5zCwsnjp95qxwLnv+wsVLS7n85ZpjerZKq6qpm/aeIjtU1wxadTVXp3uWTeWuotNdpfOI7e/2qO1opvHUHVi03pXbhtbSVNlFUyP3pvbMl1Tb8m8GQWH1QUHyjCa6U9eXTFs22tSXLC0IfKkr94PCqwJzn9zIrknPPbmZrUzAJKNUplAauSIpEX4UZhUxVIoQHltmfvEdSNAEE1TwoAsUDHBR10EGB899EIGAhbY6+GizUdP4PoUAshjroRdFDxmtHby3cbUfWg1cM0yHR6uYRcfLxsgCrJAf5As5Jt/JV/KT/DkRy+cYjMsApTKMpVZj6e3Vnd+JUV2ULhyMo2I5u9CCMueqIXeLW9hTqMP43sv3xzv3t1f8VfKJHCH/j+SQfMMnMHq/1M8Vuv0hho+CXIYVM3D9glejy/kZWH8f7Tbam+jDal7nFpa9z/vxr3L+yMuHIvoFCZgKxnm8V/GoY790uG3Og9UmCXnSM4rdRGuLV5pGItidZWqHfFTMPMBKMp3AGl4luI1SRDlcx3ON4sWznc6dhm+TvxedSA0nOYsh5ztcJvOdxYvnPC9/mh6qKJX/Rk+qynheo30swXqkh+nrku49mc2fhjV7522863Nnj81dOezmvdEspuEexY1nPs0hDW8ZtQP+zVP4BM/OYGnE9xZsjCaynIL7LHY8/3lc0jzDsFNJHSCRqZmUSdPOUNN9XU/oAP5dxel/6axSWy+JpCRWNoqbD8P/rADLcB1uIMpd2ITHsAVVzHqUyWeuZZaFJ4Ip9IT+0HUhE8ZcgcghvP4Lf8DdKg==</latexit><latexit sha1_base64="0v33v5Je9NLzYntL8CHM8lep3rI=">AAAIx3icpVW9b9NQEL+0fJjw0QQWRJeIqBVCVfRcCgQkpAoW2JqWpJXqENnOS2rFsY0/QoJlJAYWdsTABBID4s9g4R9g6J+AOhaJhYF7LyaJk9S2hC377t27+93Pd2dbsXTNcQk5zCwsnjp95qxwLnv+wsVLS7n85ZpjerZKq6qpm/aeIjtU1wxadTVXp3uWTeWuotNdpfOI7e/2qO1opvHUHVi03pXbhtbSVNlFUyP3pvbMl1Tb8m8GQWH1QUHyjCa6U9eXTFs22tSXLC0IfKkr94PCqwJzn9zIrknPPbmZrUzAJKNUplAauSIpEX4UZhUxVIoQHltmfvEdSNAEE1TwoAsUDHBR10EGB899EIGAhbY6+GizUdP4PoUAshjroRdFDxmtHby3cbUfWg1cM0yHR6uYRcfLxsgCrJAf5As5Jt/JV/KT/DkRy+cYjMsApTKMpVZj6e3Vnd+JUV2ULhyMo2I5u9CCMueqIXeLW9hTqMP43sv3xzv3t1f8VfKJHCH/j+SQfMMnMHq/1M8Vuv0hho+CXIYVM3D9glejy/kZWH8f7Tbam+jDal7nFpa9z/vxr3L+yMuHIvoFCZgKxnm8V/GoY790uG3Og9UmCXnSM4rdRGuLV5pGItidZWqHfFTMPMBKMp3AGl4luI1SRDlcx3ON4sWznc6dhm+TvxedSA0nOYsh5ztcJvOdxYvnPC9/mh6qKJX/Rk+qynheo30swXqkh+nrku49mc2fhjV7522863Nnj81dOezmvdEspuEexY1nPs0hDW8ZtQP+zVP4BM/OYGnE9xZsjCaynIL7LHY8/3lc0jzDsFNJHSCRqZmUSdPOUNN9XU/oAP5dxel/6axSWy+JpCRWNoqbD8P/rADLcB1uIMpd2ITHsAVVzHqUyWeuZZaFJ4Ip9IT+0HUhE8ZcgcghvP4Lf8DdKg==</latexit><latexit sha1_base64="0v33v5Je9NLzYntL8CHM8lep3rI=">AAAIx3icpVW9b9NQEL+0fJjw0QQWRJeIqBVCVfRcCgQkpAoW2JqWpJXqENnOS2rFsY0/QoJlJAYWdsTABBID4s9g4R9g6J+AOhaJhYF7LyaJk9S2hC377t27+93Pd2dbsXTNcQk5zCwsnjp95qxwLnv+wsVLS7n85ZpjerZKq6qpm/aeIjtU1wxadTVXp3uWTeWuotNdpfOI7e/2qO1opvHUHVi03pXbhtbSVNlFUyP3pvbMl1Tb8m8GQWH1QUHyjCa6U9eXTFs22tSXLC0IfKkr94PCqwJzn9zIrknPPbmZrUzAJKNUplAauSIpEX4UZhUxVIoQHltmfvEdSNAEE1TwoAsUDHBR10EGB899EIGAhbY6+GizUdP4PoUAshjroRdFDxmtHby3cbUfWg1cM0yHR6uYRcfLxsgCrJAf5As5Jt/JV/KT/DkRy+cYjMsApTKMpVZj6e3Vnd+JUV2ULhyMo2I5u9CCMueqIXeLW9hTqMP43sv3xzv3t1f8VfKJHCH/j+SQfMMnMHq/1M8Vuv0hho+CXIYVM3D9glejy/kZWH8f7Tbam+jDal7nFpa9z/vxr3L+yMuHIvoFCZgKxnm8V/GoY790uG3Og9UmCXnSM4rdRGuLV5pGItidZWqHfFTMPMBKMp3AGl4luI1SRDlcx3ON4sWznc6dhm+TvxedSA0nOYsh5ztcJvOdxYvnPC9/mh6qKJX/Rk+qynheo30swXqkh+nrku49mc2fhjV7522863Nnj81dOezmvdEspuEexY1nPs0hDW8ZtQP+zVP4BM/OYGnE9xZsjCaynIL7LHY8/3lc0jzDsFNJHSCRqZmUSdPOUNN9XU/oAP5dxel/6axSWy+JpCRWNoqbD8P/rADLcB1uIMpd2ITHsAVVzHqUyWeuZZaFJ4Ip9IT+0HUhE8ZcgcghvP4Lf8DdKg==</latexit><latexit sha1_base64="0v33v5Je9NLzYntL8CHM8lep3rI=">AAAIx3icpVW9b9NQEL+0fJjw0QQWRJeIqBVCVfRcCgQkpAoW2JqWpJXqENnOS2rFsY0/QoJlJAYWdsTABBID4s9g4R9g6J+AOhaJhYF7LyaJk9S2hC377t27+93Pd2dbsXTNcQk5zCwsnjp95qxwLnv+wsVLS7n85ZpjerZKq6qpm/aeIjtU1wxadTVXp3uWTeWuotNdpfOI7e/2qO1opvHUHVi03pXbhtbSVNlFUyP3pvbMl1Tb8m8GQWH1QUHyjCa6U9eXTFs22tSXLC0IfKkr94PCqwJzn9zIrknPPbmZrUzAJKNUplAauSIpEX4UZhUxVIoQHltmfvEdSNAEE1TwoAsUDHBR10EGB899EIGAhbY6+GizUdP4PoUAshjroRdFDxmtHby3cbUfWg1cM0yHR6uYRcfLxsgCrJAf5As5Jt/JV/KT/DkRy+cYjMsApTKMpVZj6e3Vnd+JUV2ULhyMo2I5u9CCMueqIXeLW9hTqMP43sv3xzv3t1f8VfKJHCH/j+SQfMMnMHq/1M8Vuv0hho+CXIYVM3D9glejy/kZWH8f7Tbam+jDal7nFpa9z/vxr3L+yMuHIvoFCZgKxnm8V/GoY790uG3Og9UmCXnSM4rdRGuLV5pGItidZWqHfFTMPMBKMp3AGl4luI1SRDlcx3ON4sWznc6dhm+TvxedSA0nOYsh5ztcJvOdxYvnPC9/mh6qKJX/Rk+qynheo30swXqkh+nrku49mc2fhjV7522863Nnj81dOezmvdEspuEexY1nPs0hDW8ZtQP+zVP4BM/OYGnE9xZsjCaynIL7LHY8/3lc0jzDsFNJHSCRqZmUSdPOUNN9XU/oAP5dxel/6axSWy+JpCRWNoqbD8P/rADLcB1uIMpd2ITHsAVVzHqUyWeuZZaFJ4Ip9IT+0HUhE8ZcgcghvP4Lf8DdKg==</latexit><latexit sha1_base64="0v33v5Je9NLzYntL8CHM8lep3rI=">AAAIx3icpVW9b9NQEL+0fJjw0QQWRJeIqBVCVfRcCgQkpAoW2JqWpJXqENnOS2rFsY0/QoJlJAYWdsTABBID4s9g4R9g6J+AOhaJhYF7LyaJk9S2hC377t27+93Pd2dbsXTNcQk5zCwsnjp95qxwLnv+wsVLS7n85ZpjerZKq6qpm/aeIjtU1wxadTVXp3uWTeWuotNdpfOI7e/2qO1opvHUHVi03pXbhtbSVNlFUyP3pvbMl1Tb8m8GQWH1QUHyjCa6U9eXTFs22tSXLC0IfKkr94PCqwJzn9zIrknPPbmZrUzAJKNUplAauSIpEX4UZhUxVIoQHltmfvEdSNAEE1TwoAsUDHBR10EGB899EIGAhbY6+GizUdP4PoUAshjroRdFDxmtHby3cbUfWg1cM0yHR6uYRcfLxsgCrJAf5As5Jt/JV/KT/DkRy+cYjMsApTKMpVZj6e3Vnd+JUV2ULhyMo2I5u9CCMueqIXeLW9hTqMP43sv3xzv3t1f8VfKJHCH/j+SQfMMnMHq/1M8Vuv0hho+CXIYVM3D9glejy/kZWH8f7Tbam+jDal7nFpa9z/vxr3L+yMuHIvoFCZgKxnm8V/GoY790uG3Og9UmCXnSM4rdRGuLV5pGItidZWqHfFTMPMBKMp3AGl4luI1SRDlcx3ON4sWznc6dhm+TvxedSA0nOYsh5ztcJvOdxYvnPC9/mh6qKJX/Rk+qynheo30swXqkh+nrku49mc2fhjV7522863Nnj81dOezmvdEspuEexY1nPs0hDW8ZtQP+zVP4BM/OYGnE9xZsjCaynIL7LHY8/3lc0jzDsFNJHSCRqZmUSdPOUNN9XU/oAP5dxel/6axSWy+JpCRWNoqbD8P/rADLcB1uIMpd2ITHsAVVzHqUyWeuZZaFJ4Ip9IT+0HUhE8ZcgcghvP4Lf8DdKg==</latexit><latexit sha1_base64="0v33v5Je9NLzYntL8CHM8lep3rI=">AAAIx3icpVW9b9NQEL+0fJjw0QQWRJeIqBVCVfRcCgQkpAoW2JqWpJXqENnOS2rFsY0/QoJlJAYWdsTABBID4s9g4R9g6J+AOhaJhYF7LyaJk9S2hC377t27+93Pd2dbsXTNcQk5zCwsnjp95qxwLnv+wsVLS7n85ZpjerZKq6qpm/aeIjtU1wxadTVXp3uWTeWuotNdpfOI7e/2qO1opvHUHVi03pXbhtbSVNlFUyP3pvbMl1Tb8m8GQWH1QUHyjCa6U9eXTFs22tSXLC0IfKkr94PCqwJzn9zIrknPPbmZrUzAJKNUplAauSIpEX4UZhUxVIoQHltmfvEdSNAEE1TwoAsUDHBR10EGB899EIGAhbY6+GizUdP4PoUAshjroRdFDxmtHby3cbUfWg1cM0yHR6uYRcfLxsgCrJAf5As5Jt/JV/KT/DkRy+cYjMsApTKMpVZj6e3Vnd+JUV2ULhyMo2I5u9CCMueqIXeLW9hTqMP43sv3xzv3t1f8VfKJHCH/j+SQfMMnMHq/1M8Vuv0hho+CXIYVM3D9glejy/kZWH8f7Tbam+jDal7nFpa9z/vxr3L+yMuHIvoFCZgKxnm8V/GoY790uG3Og9UmCXnSM4rdRGuLV5pGItidZWqHfFTMPMBKMp3AGl4luI1SRDlcx3ON4sWznc6dhm+TvxedSA0nOYsh5ztcJvOdxYvnPC9/mh6qKJX/Rk+qynheo30swXqkh+nrku49mc2fhjV7522863Nnj81dOezmvdEspuEexY1nPs0hDW8ZtQP+zVP4BM/OYGnE9xZsjCaynIL7LHY8/3lc0jzDsFNJHSCRqZmUSdPOUNN9XU/oAP5dxel/6axSWy+JpCRWNoqbD8P/rADLcB1uIMpd2ITHsAVVzHqUyWeuZZaFJ4Ip9IT+0HUhE8ZcgcghvP4Lf8DdKg==</latexit>
V ⇤
= V ⇡⇤
, Q⇤
= Q⇡⇤
<latexit sha1_base64="oXdA2MhrfetmrVgD8WBSpAu34Ik=">AAAIiXicpVU/b9NQEL+2/DEp0BQWJJaIqBVClfVc2hIqVaroAGPTkrRSk0a285JacWxjO4Fg5QuwIwYmkBgQ4lOwsCOGSvABEGORWBi49+wmcZzalmrLvnvnu9/9fHfPVixdc1xCjqemZy5cvHRZuJKZvXrt+lx2/kbZMTu2SkuqqZv2viI7VNcMWnI1V6f7lk3ltqLTPaW1xZ7vdantaKbx1O1ZtNqWm4bW0FTZRVMt+7h86FVU2/Lu9fu5xY3c6bJiaYfM1s8sVZ515HqmOOK3kSuOu9WyeSISfuSiihQoeQiObXN+5jVUoA4mqNCBNlAwwEVdBxkcPA9AAgIW2qrgoc1GTePPKfQhg7Ed9KLoIaO1hfcmrg4Cq4FrhunwaBWz6HjZGJmDBfKdfCQn5Cv5RH6Rf2dieRyDcemhVPxYatXmXt3a/ZsY1UbpwtEwKpazCw0ocK4acre4hb2F6sd3X7452V3fWfAWyXvyG/m/I8fkC76B0f2jfijSnbcxfBTk4lfMwPVzXo0252dg/T2022ivow+reZVbWPYXvB+nlfMGXh7k0a+fgKlgXIf3Kh516JcOt8l5sNokIY96hrHraG3wStNQBLuzTM2Aj4qZe1hJphNYwkuEVZQSSn8dzzWMF892PHcavnW+L1qhGo5ylgLOa1wm843ixXOelD9ND1WUyrnRk6oynNdwH0VYDvUwfV3S7ZNo/jSs2Z638a5PnD02d4Wgmw8Hs5iGexg3nvk4hzS8ZdSO+DdP4RMcnUFxwPc+rAwmspCCexQ7nv8kLmnewe9UUgdIaGpGZdK0M9R0X9czOoB/V2n8XxpVysuiRESpuJLffBT8ZwW4DXfgLqI8gE14AttQwqyf4Rv8gJ/CrCAJBWHdd52eCmJuQugQtv4D+LXEFw==</latexit><latexit sha1_base64="oXdA2MhrfetmrVgD8WBSpAu34Ik=">AAAIiXicpVU/b9NQEL+2/DEp0BQWJJaIqBVClfVc2hIqVaroAGPTkrRSk0a285JacWxjO4Fg5QuwIwYmkBgQ4lOwsCOGSvABEGORWBi49+wmcZzalmrLvnvnu9/9fHfPVixdc1xCjqemZy5cvHRZuJKZvXrt+lx2/kbZMTu2SkuqqZv2viI7VNcMWnI1V6f7lk3ltqLTPaW1xZ7vdantaKbx1O1ZtNqWm4bW0FTZRVMt+7h86FVU2/Lu9fu5xY3c6bJiaYfM1s8sVZ515HqmOOK3kSuOu9WyeSISfuSiihQoeQiObXN+5jVUoA4mqNCBNlAwwEVdBxkcPA9AAgIW2qrgoc1GTePPKfQhg7Ed9KLoIaO1hfcmrg4Cq4FrhunwaBWz6HjZGJmDBfKdfCQn5Cv5RH6Rf2dieRyDcemhVPxYatXmXt3a/ZsY1UbpwtEwKpazCw0ocK4acre4hb2F6sd3X7452V3fWfAWyXvyG/m/I8fkC76B0f2jfijSnbcxfBTk4lfMwPVzXo0252dg/T2022ivow+reZVbWPYXvB+nlfMGXh7k0a+fgKlgXIf3Kh516JcOt8l5sNokIY96hrHraG3wStNQBLuzTM2Aj4qZe1hJphNYwkuEVZQSSn8dzzWMF892PHcavnW+L1qhGo5ylgLOa1wm843ixXOelD9ND1WUyrnRk6oynNdwH0VYDvUwfV3S7ZNo/jSs2Z638a5PnD02d4Wgmw8Hs5iGexg3nvk4hzS8ZdSO+DdP4RMcnUFxwPc+rAwmspCCexQ7nv8kLmnewe9UUgdIaGpGZdK0M9R0X9czOoB/V2n8XxpVysuiRESpuJLffBT8ZwW4DXfgLqI8gE14AttQwqyf4Rv8gJ/CrCAJBWHdd52eCmJuQugQtv4D+LXEFw==</latexit><latexit sha1_base64="oXdA2MhrfetmrVgD8WBSpAu34Ik=">AAAIiXicpVU/b9NQEL+2/DEp0BQWJJaIqBVClfVc2hIqVaroAGPTkrRSk0a285JacWxjO4Fg5QuwIwYmkBgQ4lOwsCOGSvABEGORWBi49+wmcZzalmrLvnvnu9/9fHfPVixdc1xCjqemZy5cvHRZuJKZvXrt+lx2/kbZMTu2SkuqqZv2viI7VNcMWnI1V6f7lk3ltqLTPaW1xZ7vdantaKbx1O1ZtNqWm4bW0FTZRVMt+7h86FVU2/Lu9fu5xY3c6bJiaYfM1s8sVZ515HqmOOK3kSuOu9WyeSISfuSiihQoeQiObXN+5jVUoA4mqNCBNlAwwEVdBxkcPA9AAgIW2qrgoc1GTePPKfQhg7Ed9KLoIaO1hfcmrg4Cq4FrhunwaBWz6HjZGJmDBfKdfCQn5Cv5RH6Rf2dieRyDcemhVPxYatXmXt3a/ZsY1UbpwtEwKpazCw0ocK4acre4hb2F6sd3X7452V3fWfAWyXvyG/m/I8fkC76B0f2jfijSnbcxfBTk4lfMwPVzXo0252dg/T2022ivow+reZVbWPYXvB+nlfMGXh7k0a+fgKlgXIf3Kh516JcOt8l5sNokIY96hrHraG3wStNQBLuzTM2Aj4qZe1hJphNYwkuEVZQSSn8dzzWMF892PHcavnW+L1qhGo5ylgLOa1wm843ixXOelD9ND1WUyrnRk6oynNdwH0VYDvUwfV3S7ZNo/jSs2Z638a5PnD02d4Wgmw8Hs5iGexg3nvk4hzS8ZdSO+DdP4RMcnUFxwPc+rAwmspCCexQ7nv8kLmnewe9UUgdIaGpGZdK0M9R0X9czOoB/V2n8XxpVysuiRESpuJLffBT8ZwW4DXfgLqI8gE14AttQwqyf4Rv8gJ/CrCAJBWHdd52eCmJuQugQtv4D+LXEFw==</latexit><latexit sha1_base64="+yUFvErgRhyN9ZPsRpgMcGD0ZDQ=">AAAIMnicpVXNbtNAEJ62FJdQaHtD4lIRFXFA1rj8BU5IvXDsD2krlaiynU1qxX+ynUCw+gK9VogDJ5A4IJ4CceEFOPQJEEJCSEXiwoHZtZvEcWqvhKN4Zscz33yemfUavm2FEeLJ1PTMhdmLytylyuX5ypWrC4vz26HXDUxWNz3bC3YNPWS25bJ6ZEU22/UDpjuGzXaMzhp/vtNjQWh57tOo77OGo7ddq2WZekSm9f3FKqooruW8oqVKFdLLW5p5Bc+gCR6Y0AUHGLgQkW6DDiH99kADBJ9sDYjJFpBmiecMDqFCsV3yYuShk7VD9zat9lKrS2uOGYpok7LY9A8ochlW8Ct+wFP8gh/xO/49FysWGJxLn6SRxDJ/f+Ho2taf0iiHZAQHw6hCzhG0oCa4WsTdFxb+FmYS33v5+nTr0eZKfBPf4Q/i/xZP8DO9gdv7bb7fYJtvCvgYxCWpmEvr56IajuDnUv1jsgdkb5IPr3lDWHj2F6IfZ5WLB14xVMnvsATToLiu6FUx6tBPDrctePDalCGPemaxm2RtiUqzTAS/80ztlI9JmftUSa4j3Ka/CvdIaiSTdTHXLF4x2/HcMnybYl90MjUc5aylnO8LWc43j1fMeVJ+mR6aJI3/Ri+rynBes31UYTXTQ/m6yO2TfH4Z1nzPB3S3J84en7ta2s2Hg1mU4Z7FLWY+zkGGt07agfjmGWKC8zOoDvjegbuDiaxJcM9jF/OfxEXmHZJOlXUAM1MzKsumnaPKfV3P6QAdrtr4UZpXtldVDVVtA2EOrsMNuEXBD+AxPIF1qItNdwTHyiflm/IzOYSnp9LTeAkyl/LrHw59qUs=</latexit><latexit sha1_base64="LHI4JcDRtY7pidJUWgnbr14p4tY=">AAAIfnicpVXNbtNAEJ62/Ji00JQTEpeIqBVClbUuf6FSJSQOcGxaklZq0sh2NqkVxza2EwhWXoA74sAJJA4I8RRcuCMOleABEMciceHA7NpN4ri1V6ote2bHM998nhl7Ncc0PJ+Qw5nZuXPnL1yULuXmFy5fWcwvLVQ9u+fqtKLbpu3uaqpHTcOiFd/wTbrruFTtaibd0TqP2POdPnU9w7ae+gOH1rtq2zJahq76aGrkH1f3g5ruOsGt4bCwslE4XtYcY5/ZhrnV2rOe2syVJ/w2CuVpt0a+SGTCj0JSUSKlCNGxaS/NvYYaNMEGHXrQBQoW+KiboIKH5x4oQMBBWx0CtLmoGfw5hSHkMLaHXhQ9VLR28N7G1V5ktXDNMD0erWMWEy8XIwuwTL6Tj+SIfCWfyC/y71SsgGMwLgOUWhhLncbiq2vbfzOjuih9OBhHpXL2oQUlztVA7g63sLfQw/j+yzdH2+tby8EKeU9+I/935JB8wTew+n/0D2W69TaFj4ZcwopZuH7Oq9Hl/Cysf4B2F+1N9GE1r3MLy/6C9+O4csHIK4Ai+g0zMDWM6/FepaOO/cRw25wHq00W8qRnHLuJ1havNI1FsDvL1I746Jh5gJVkOoFVvGS4i1JBGa7Tucbx0tlO5xbh2+TfRSdWw0nOSsT5HpfZfJN46ZxPyi/SQx2ldmb0rKqM5zXeRxnWYj0Ur4vYd5LML8KaffMu3s0TZ4/NXSnq5oPRLIpwj+OmM5/mIMJbRe2A//M0PsHJGZRHfG/DndFElgS4J7HT+Z/EReQdwk5ldYDEpmZSZk07QxX7u57SAdxdlem9NKlU12SFyEqZgATX4QbcxOD78BCewCZUMNln+AY/4Kc0LylSKdyHZ2eiDfkqxA5p/T9im8Mc</latexit><latexit sha1_base64="LHI4JcDRtY7pidJUWgnbr14p4tY=">AAAIfnicpVXNbtNAEJ62/Ji00JQTEpeIqBVClbUuf6FSJSQOcGxaklZq0sh2NqkVxza2EwhWXoA74sAJJA4I8RRcuCMOleABEMciceHA7NpN4ri1V6ote2bHM998nhl7Ncc0PJ+Qw5nZuXPnL1yULuXmFy5fWcwvLVQ9u+fqtKLbpu3uaqpHTcOiFd/wTbrruFTtaibd0TqP2POdPnU9w7ae+gOH1rtq2zJahq76aGrkH1f3g5ruOsGt4bCwslE4XtYcY5/ZhrnV2rOe2syVJ/w2CuVpt0a+SGTCj0JSUSKlCNGxaS/NvYYaNMEGHXrQBQoW+KiboIKH5x4oQMBBWx0CtLmoGfw5hSHkMLaHXhQ9VLR28N7G1V5ktXDNMD0erWMWEy8XIwuwTL6Tj+SIfCWfyC/y71SsgGMwLgOUWhhLncbiq2vbfzOjuih9OBhHpXL2oQUlztVA7g63sLfQw/j+yzdH2+tby8EKeU9+I/935JB8wTew+n/0D2W69TaFj4ZcwopZuH7Oq9Hl/Cysf4B2F+1N9GE1r3MLy/6C9+O4csHIK4Ai+g0zMDWM6/FepaOO/cRw25wHq00W8qRnHLuJ1havNI1FsDvL1I746Jh5gJVkOoFVvGS4i1JBGa7Tucbx0tlO5xbh2+TfRSdWw0nOSsT5HpfZfJN46ZxPyi/SQx2ldmb0rKqM5zXeRxnWYj0Ur4vYd5LML8KaffMu3s0TZ4/NXSnq5oPRLIpwj+OmM5/mIMJbRe2A//M0PsHJGZRHfG/DndFElgS4J7HT+Z/EReQdwk5ldYDEpmZSZk07QxX7u57SAdxdlem9NKlU12SFyEqZgATX4QbcxOD78BCewCZUMNln+AY/4Kc0LylSKdyHZ2eiDfkqxA5p/T9im8Mc</latexit><latexit sha1_base64="OtdeQzzRuKsJSoU8UthZrp+d0aY=">AAAIiXicpVVLb9NAEJ62PEwKNIULEpeIqBVClbUur1CpUkUPcGxaklZq0siPTWrFsY3tBIKVP8AdceAEEgeE+BVcuCMOleAHII5F4sKB2bWbxHFqW8KWPbPjmW8+z8zaim3orkfI0czs3Jmz584LF3LzFy9dXsgvXqm6VtdRaUW1DMvZU2SXGrpJK57uGXTPdqjcUQy6q7Q32fPdHnVc3TKfeH2b1jtyy9Sbuip7aGrkH1UP/Jrq2P6twaCwvF44WdZs/YDZBrmV2tOurOXKY37rhfKkWyNfJCLhRyGuSKFShPDYshbnXkENNLBAhS50gIIJHuoGyODiuQ8SELDRVgcfbQ5qOn9OYQA5jO2iF0UPGa1tvLdwtR9aTVwzTJdHq5jFwMvByAIskW/kAzkmX8hH8pP8PRXL5xiMSx+lEsRSu7Hw8trOn9SoDkoPDkdRiZw9aEKJc9WRu80t7C3UIL734vXxztr2kr9M3pFfyP8tOSKf8Q3M3m/1fZluv0ngoyCXoGImrp/xanQ4PxPr76PdQbuGPqzmdW5h2Z/zfpxUzh96+VBEv0EKpoJxXd6rZNSRXzbcFufBapOGPO4ZxdbQ2uSVppEIdmeZWiEfFTP3sZJMJ7CClwh3UUoog3Uy1yheMtvJ3Fn4anxftCM1HOcshZzvcZnON46XzHla/iw9VFEq/42eVpXRvEb7KMJqpIfZ65Jtn8TzZ2HN9ryDd2Pq7LG5K4XdfDCcxSzco7jJzCc5ZOEto3bIv3kKn+D4DIpDvrfhznAiSxm4x7GT+U/jkuUdgk6ldYBEpmZcpk07Q832dT2lA/h3lSb/pXGluipKRJTKpLjxMPzPCnAdbsBNRLkPG/AYtqCCWT/BV/gOP4R5QRJKwlrgOjsTxlyFyCFs/gP3dcQT</latexit><latexit sha1_base64="oXdA2MhrfetmrVgD8WBSpAu34Ik=">AAAIiXicpVU/b9NQEL+2/DEp0BQWJJaIqBVClfVc2hIqVaroAGPTkrRSk0a285JacWxjO4Fg5QuwIwYmkBgQ4lOwsCOGSvABEGORWBi49+wmcZzalmrLvnvnu9/9fHfPVixdc1xCjqemZy5cvHRZuJKZvXrt+lx2/kbZMTu2SkuqqZv2viI7VNcMWnI1V6f7lk3ltqLTPaW1xZ7vdantaKbx1O1ZtNqWm4bW0FTZRVMt+7h86FVU2/Lu9fu5xY3c6bJiaYfM1s8sVZ515HqmOOK3kSuOu9WyeSISfuSiihQoeQiObXN+5jVUoA4mqNCBNlAwwEVdBxkcPA9AAgIW2qrgoc1GTePPKfQhg7Ed9KLoIaO1hfcmrg4Cq4FrhunwaBWz6HjZGJmDBfKdfCQn5Cv5RH6Rf2dieRyDcemhVPxYatXmXt3a/ZsY1UbpwtEwKpazCw0ocK4acre4hb2F6sd3X7452V3fWfAWyXvyG/m/I8fkC76B0f2jfijSnbcxfBTk4lfMwPVzXo0252dg/T2022ivow+reZVbWPYXvB+nlfMGXh7k0a+fgKlgXIf3Kh516JcOt8l5sNokIY96hrHraG3wStNQBLuzTM2Aj4qZe1hJphNYwkuEVZQSSn8dzzWMF892PHcavnW+L1qhGo5ylgLOa1wm843ixXOelD9ND1WUyrnRk6oynNdwH0VYDvUwfV3S7ZNo/jSs2Z638a5PnD02d4Wgmw8Hs5iGexg3nvk4hzS8ZdSO+DdP4RMcnUFxwPc+rAwmspCCexQ7nv8kLmnewe9UUgdIaGpGZdK0M9R0X9czOoB/V2n8XxpVysuiRESpuJLffBT8ZwW4DXfgLqI8gE14AttQwqyf4Rv8gJ/CrCAJBWHdd52eCmJuQugQtv4D+LXEFw==</latexit><latexit sha1_base64="oXdA2MhrfetmrVgD8WBSpAu34Ik=">AAAIiXicpVU/b9NQEL+2/DEp0BQWJJaIqBVClfVc2hIqVaroAGPTkrRSk0a285JacWxjO4Fg5QuwIwYmkBgQ4lOwsCOGSvABEGORWBi49+wmcZzalmrLvnvnu9/9fHfPVixdc1xCjqemZy5cvHRZuJKZvXrt+lx2/kbZMTu2SkuqqZv2viI7VNcMWnI1V6f7lk3ltqLTPaW1xZ7vdantaKbx1O1ZtNqWm4bW0FTZRVMt+7h86FVU2/Lu9fu5xY3c6bJiaYfM1s8sVZ515HqmOOK3kSuOu9WyeSISfuSiihQoeQiObXN+5jVUoA4mqNCBNlAwwEVdBxkcPA9AAgIW2qrgoc1GTePPKfQhg7Ed9KLoIaO1hfcmrg4Cq4FrhunwaBWz6HjZGJmDBfKdfCQn5Cv5RH6Rf2dieRyDcemhVPxYatXmXt3a/ZsY1UbpwtEwKpazCw0ocK4acre4hb2F6sd3X7452V3fWfAWyXvyG/m/I8fkC76B0f2jfijSnbcxfBTk4lfMwPVzXo0252dg/T2022ivow+reZVbWPYXvB+nlfMGXh7k0a+fgKlgXIf3Kh516JcOt8l5sNokIY96hrHraG3wStNQBLuzTM2Aj4qZe1hJphNYwkuEVZQSSn8dzzWMF892PHcavnW+L1qhGo5ylgLOa1wm843ixXOelD9ND1WUyrnRk6oynNdwH0VYDvUwfV3S7ZNo/jSs2Z638a5PnD02d4Wgmw8Hs5iGexg3nvk4hzS8ZdSO+DdP4RMcnUFxwPc+rAwmspCCexQ7nv8kLmnewe9UUgdIaGpGZdK0M9R0X9czOoB/V2n8XxpVysuiRESpuJLffBT8ZwW4DXfgLqI8gE14AttQwqyf4Rv8gJ/CrCAJBWHdd52eCmJuQugQtv4D+LXEFw==</latexit><latexit sha1_base64="oXdA2MhrfetmrVgD8WBSpAu34Ik=">AAAIiXicpVU/b9NQEL+2/DEp0BQWJJaIqBVClfVc2hIqVaroAGPTkrRSk0a285JacWxjO4Fg5QuwIwYmkBgQ4lOwsCOGSvABEGORWBi49+wmcZzalmrLvnvnu9/9fHfPVixdc1xCjqemZy5cvHRZuJKZvXrt+lx2/kbZMTu2SkuqqZv2viI7VNcMWnI1V6f7lk3ltqLTPaW1xZ7vdantaKbx1O1ZtNqWm4bW0FTZRVMt+7h86FVU2/Lu9fu5xY3c6bJiaYfM1s8sVZ515HqmOOK3kSuOu9WyeSISfuSiihQoeQiObXN+5jVUoA4mqNCBNlAwwEVdBxkcPA9AAgIW2qrgoc1GTePPKfQhg7Ed9KLoIaO1hfcmrg4Cq4FrhunwaBWz6HjZGJmDBfKdfCQn5Cv5RH6Rf2dieRyDcemhVPxYatXmXt3a/ZsY1UbpwtEwKpazCw0ocK4acre4hb2F6sd3X7452V3fWfAWyXvyG/m/I8fkC76B0f2jfijSnbcxfBTk4lfMwPVzXo0252dg/T2022ivow+reZVbWPYXvB+nlfMGXh7k0a+fgKlgXIf3Kh516JcOt8l5sNokIY96hrHraG3wStNQBLuzTM2Aj4qZe1hJphNYwkuEVZQSSn8dzzWMF892PHcavnW+L1qhGo5ylgLOa1wm843ixXOelD9ND1WUyrnRk6oynNdwH0VYDvUwfV3S7ZNo/jSs2Z638a5PnD02d4Wgmw8Hs5iGexg3nvk4hzS8ZdSO+DdP4RMcnUFxwPc+rAwmspCCexQ7nv8kLmnewe9UUgdIaGpGZdK0M9R0X9czOoB/V2n8XxpVysuiRESpuJLffBT8ZwW4DXfgLqI8gE14AttQwqyf4Rv8gJ/CrCAJBWHdd52eCmJuQugQtv4D+LXEFw==</latexit><latexit sha1_base64="oXdA2MhrfetmrVgD8WBSpAu34Ik=">AAAIiXicpVU/b9NQEL+2/DEp0BQWJJaIqBVClfVc2hIqVaroAGPTkrRSk0a285JacWxjO4Fg5QuwIwYmkBgQ4lOwsCOGSvABEGORWBi49+wmcZzalmrLvnvnu9/9fHfPVixdc1xCjqemZy5cvHRZuJKZvXrt+lx2/kbZMTu2SkuqqZv2viI7VNcMWnI1V6f7lk3ltqLTPaW1xZ7vdantaKbx1O1ZtNqWm4bW0FTZRVMt+7h86FVU2/Lu9fu5xY3c6bJiaYfM1s8sVZ515HqmOOK3kSuOu9WyeSISfuSiihQoeQiObXN+5jVUoA4mqNCBNlAwwEVdBxkcPA9AAgIW2qrgoc1GTePPKfQhg7Ed9KLoIaO1hfcmrg4Cq4FrhunwaBWz6HjZGJmDBfKdfCQn5Cv5RH6Rf2dieRyDcemhVPxYatXmXt3a/ZsY1UbpwtEwKpazCw0ocK4acre4hb2F6sd3X7452V3fWfAWyXvyG/m/I8fkC76B0f2jfijSnbcxfBTk4lfMwPVzXo0252dg/T2022ivow+reZVbWPYXvB+nlfMGXh7k0a+fgKlgXIf3Kh516JcOt8l5sNokIY96hrHraG3wStNQBLuzTM2Aj4qZe1hJphNYwkuEVZQSSn8dzzWMF892PHcavnW+L1qhGo5ylgLOa1wm843ixXOelD9ND1WUyrnRk6oynNdwH0VYDvUwfV3S7ZNo/jSs2Z638a5PnD02d4Wgmw8Hs5iGexg3nvk4hzS8ZdSO+DdP4RMcnUFxwPc+rAwmspCCexQ7nv8kLmnewe9UUgdIaGpGZdK0M9R0X9czOoB/V2n8XxpVysuiRESpuJLffBT8ZwW4DXfgLqI8gE14AttQwqyf4Rv8gJ/CrCAJBWHdd52eCmJuQugQtv4D+LXEFw==</latexit><latexit sha1_base64="oXdA2MhrfetmrVgD8WBSpAu34Ik=">AAAIiXicpVU/b9NQEL+2/DEp0BQWJJaIqBVClfVc2hIqVaroAGPTkrRSk0a285JacWxjO4Fg5QuwIwYmkBgQ4lOwsCOGSvABEGORWBi49+wmcZzalmrLvnvnu9/9fHfPVixdc1xCjqemZy5cvHRZuJKZvXrt+lx2/kbZMTu2SkuqqZv2viI7VNcMWnI1V6f7lk3ltqLTPaW1xZ7vdantaKbx1O1ZtNqWm4bW0FTZRVMt+7h86FVU2/Lu9fu5xY3c6bJiaYfM1s8sVZ515HqmOOK3kSuOu9WyeSISfuSiihQoeQiObXN+5jVUoA4mqNCBNlAwwEVdBxkcPA9AAgIW2qrgoc1GTePPKfQhg7Ed9KLoIaO1hfcmrg4Cq4FrhunwaBWz6HjZGJmDBfKdfCQn5Cv5RH6Rf2dieRyDcemhVPxYatXmXt3a/ZsY1UbpwtEwKpazCw0ocK4acre4hb2F6sd3X7452V3fWfAWyXvyG/m/I8fkC76B0f2jfijSnbcxfBTk4lfMwPVzXo0252dg/T2022ivow+reZVbWPYXvB+nlfMGXh7k0a+fgKlgXIf3Kh516JcOt8l5sNokIY96hrHraG3wStNQBLuzTM2Aj4qZe1hJphNYwkuEVZQSSn8dzzWMF892PHcavnW+L1qhGo5ylgLOa1wm843ixXOelD9ND1WUyrnRk6oynNdwH0VYDvUwfV3S7ZNo/jSs2Z638a5PnD02d4Wgmw8Hs5iGexg3nvk4hzS8ZdSO+DdP4RMcnUFxwPc+rAwmspCCexQ7nv8kLmnewe9UUgdIaGpGZdK0M9R0X9czOoB/V2n8XxpVysuiRESpuJLffBT8ZwW4DXfgLqI8gE14AttQwqyf4Rv8gJ/CrCAJBWHdd52eCmJuQugQtv4D+LXEFw==</latexit><latexit sha1_base64="oXdA2MhrfetmrVgD8WBSpAu34Ik=">AAAIiXicpVU/b9NQEL+2/DEp0BQWJJaIqBVClfVc2hIqVaroAGPTkrRSk0a285JacWxjO4Fg5QuwIwYmkBgQ4lOwsCOGSvABEGORWBi49+wmcZzalmrLvnvnu9/9fHfPVixdc1xCjqemZy5cvHRZuJKZvXrt+lx2/kbZMTu2SkuqqZv2viI7VNcMWnI1V6f7lk3ltqLTPaW1xZ7vdantaKbx1O1ZtNqWm4bW0FTZRVMt+7h86FVU2/Lu9fu5xY3c6bJiaYfM1s8sVZ515HqmOOK3kSuOu9WyeSISfuSiihQoeQiObXN+5jVUoA4mqNCBNlAwwEVdBxkcPA9AAgIW2qrgoc1GTePPKfQhg7Ed9KLoIaO1hfcmrg4Cq4FrhunwaBWz6HjZGJmDBfKdfCQn5Cv5RH6Rf2dieRyDcemhVPxYatXmXt3a/ZsY1UbpwtEwKpazCw0ocK4acre4hb2F6sd3X7452V3fWfAWyXvyG/m/I8fkC76B0f2jfijSnbcxfBTk4lfMwPVzXo0252dg/T2022ivow+reZVbWPYXvB+nlfMGXh7k0a+fgKlgXIf3Kh516JcOt8l5sNokIY96hrHraG3wStNQBLuzTM2Aj4qZe1hJphNYwkuEVZQSSn8dzzWMF892PHcavnW+L1qhGo5ylgLOa1wm843ixXOelD9ND1WUyrnRk6oynNdwH0VYDvUwfV3S7ZNo/jSs2Z638a5PnD02d4Wgmw8Hs5iGexg3nvk4hzS8ZdSO+DdP4RMcnUFxwPc+rAwmspCCexQ7nv8kLmnewe9UUgdIaGpGZdK0M9R0X9czOoB/V2n8XxpVysuiRESpuJLffBT8ZwW4DXfgLqI8gE14AttQwqyf4Rv8gJ/CrCAJBWHdd52eCmJuQugQtv4D+LXEFw==</latexit>
•
!
–
–
–
•
!
–
–
–
difficult and engaging for human players. We used the same network
architecture, hyperparameter values (see Extended Data Table 1) and
learningprocedurethroughout—takinghigh-dimensionaldata(210|160
colour video at 60 Hz) as input—to demonstrate that our approach
robustly learns successful policies over a variety of games based solely
onsensoryinputswithonlyveryminimalpriorknowledge(thatis,merely
the input data were visual images, and the number of actions available
in each game, but not their correspondences; see Methods). Notably,
our method was able to train large neural networks using a reinforce-
mentlearningsignalandstochasticgradientdescentinastablemanner—
illustrated by the temporal evolution of two indices of learning (the
agent’s average score-per-episode and average predicted Q-values; see
Fig. 2 and Supplementary Discussion for details).
We compared DQN with the best performing meth
reinforcement learning literature on the 49 games wher
available12,15
. In addition to the learned agents, we alsorep
aprofessionalhumangamestesterplayingundercontroll
and a policy that selects actions uniformly at random (E
Table 2 and Fig. 3, denoted by 100% (human) and 0% (
axis; see Methods). Our DQN method outperforms the
reinforcement learning methods on 43 of the games with
rating any of the additional prior knowledge about Atar
used by other approaches (for example, refs 12, 15). Furt
DQN agent performed at a level that was comparable to
fessionalhumangamestesteracrossthesetof49games,ac
than75%ofthe humanscore onmorethanhalfofthegam
Convolution Convolution Fully connected Fully connected
No input
Figure 1 | Schematic illustration of the convolutional neural network. The
details of the architecture are explained in the Methods. The input to the neural
network consists of an 843 843 4 image produced by the preprocessing
map w, followed by three convolutional layers (note: snaking blue line
symbolizes sliding of each filter across input image) and two fu
layers with a single output for each valid action. Each hidden l
by a rectifier nonlinearity (that is, max 0,xð Þ).
a b
c d
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
2,000
2,200
0 20 40 60 80 100 120 140 160 180 200
Averagescoreperepisode
Training epochs
10
11
)
0
1,000
2,000
3,000
4,000
5,000
6,000
0 20 40 60 80 100 120 140 160 180 200
Averagescoreperepisode
Training epochs
9
10
)
RESEARCH LETTER
ARTICLE RESEARCH
Regression
Classification
Classification
SelfPlay
Policy gradient
a b
Human expert positions Self-play positions
NeuralnetworkData
Rollout policy
p p p (a⎪s) (s′)p
SL policy network RL policy network Value network Policy network Value network
s s′