SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
Proceedings of the European Computing Conference 
Generic Reinforcement Schemes and Their Optimization 
DANA SIMIAN, FLORIN STOICA 
Department of Informatics 
“Lucian Blaga” University of Sibiu 
Str. Dr. Ion Ratiu 5-7, 550012, Sibiu 
ROMANIA 
dana.simian@ulbsibiu.ro, florin.stoica@ulbsibiu.ro 
Abstract: - The aim of this paper is to introduce a generic two-parameters dependent absolutely expedient 
reinforcement scheme and to present a method for learning parameters optimization. We optimize, using a Breeder 
genetic algorithm, many schemes derived from our generic one, in order to reach the best performance. Furthermore, 
we compare our results in terms of speed and efficiency. 
Key-Words: - Reinforcement Learning, Breeder genetic algorithm, Optimization 
1 Introduction 
Reinforcement schemes represent algorithms which 
realize the learning process for stochastic learning 
automata. Stochastic learning automata adapt to changes 
in their environment as results of a reinforcement 
learning process. Given a set of possible actions, a 
stochastic learning automaton must choose the optimal 
one, based on the environment response and the past 
actions. Initially equal probabilities are associated to all 
possible actions, one action is selected at random and the 
actions probabilities are updated based on the 
environment response. A detailed characterization of 
reinforcement learning can be found in [14]. In [7] is 
underlined that the major advantage of reinforcement 
learning is that it needs information about the 
environment only for the reinforcement signal. 
Reinforcement learning has several applications in 
autonomic robotics, designing multi-agent systems, 
intelligent vehicles control, etc. ([2], [3], [5], [15], [16]). 
In [11] we designed a simulator of an intelligent vehicle 
control system. The system was based on two learning 
automata. 
In other articles ([9], [13]), we defined new 
reinforcement schemes in order to reach a best 
performance of our system. Usually, reinforcement 
schemes depends on many parameters, as we can see in 
section 3. An important problem is to choose the optimal 
scheme’s parameters. 
The aim of this paper is to introduce a generic 
reinforcement scheme from which many other 
reinforcement schems can be obtained and to present an 
optimization method of these schemes with respect to 
learning parameters. We also optimize this new scheme 
and those that we introduced in [9], [12]. We evaluate 
and compare our schemes using two criteria: the speed 
of the optimization process and the efficiency of the 
optimized schema. 
The remainder of this paper is organized as follows. In 
section 2 we briefly present the mathematical 
backgrounds of stochastic learning automata with 
variable structure. In section 3 is presented our generic 
absolutely expedient reinforcement scheme, together 
with other particular schemes derived from it. In section 
3 we present our optimization method for reinforcement 
schemes learning parameters and analyze the provided 
results. Conclusions and further directions of study are 
presented in section 5. 
2 Mathematical backgrounds of stochastic 
automata 
A stochastic automaton supposes the existence of a set of 
actions, which define the input of the environment and a 
response set. The range of the response values depends 
on the model we chose. There are three different models 
for representation of the response values: P-model, S-model 
and Q-model. The P-model uses a set of binary 
values, 0 or 1. In the S-model the response values are 
continuous in the range (0, 1). In the Q-model the 
response set is a finite set of discrete values in the range 
(0, 1). In this paper we use the P-model for our 
reinforcement schemes. 
A stochastic automaton selects one action at random, 
observes the response from the environment and updates 
the action probabilities based on that response. An action 
can be rewarded or punished using a set of penalties 
probabilities. 
Mathematical model of a stochastic automaton is defined 
by a triple {α , c,β } corresponding to the elements 
presented before: 
a) α ={α1 ,α 2 ,...,α r } - the input actions of the 
environment 
b) β - the response set. 
In the case of P-model, { 1 , 2} β = β β is a binary set: 
ISBN: 978-960-474-297-4 332
Proceedings of the European Computing Conference 
β = 0 is a favourable outcome and β =1 is an 
unfavourable outcome. 
To reefer the time instant is used the notationα (n) , 
β (n) . 
c) c ={c1 , c2 ,..., cr } - the set of penalty probabilities. 
The element i c is the probability that action i α 
will 
result in an unfavourable response: 
ci = P(β (n) =1|α (n) =α i ) i =1, 2, ..., r 
The evolution in time of penalty probabilities defines 
two types of environments: stationary (the penalty 
probabilities are constant over time) and nonstationary 
(the penalties change over time). 
In the following we consider only stationary random 
environments. 
The action probabilities vector at time moment n+1 is 
updated using a mapping T and the current probabilities 
pi (n) = P(α (n) =α i ), i =1, r : 
p(n +1) = T[ p(n),α (n),β (n)] 
Reinforcement schemes are named linear if p(n +1) is 
a linear function of p(n) , and nonlinear otherwise. 
The evaluation of performances of a learning automaton 
is made using a quantitative norm of behavior ([17]) 
represented by the average penalty for a given action 
probability vector, M(n). 
M n P n p n 
( ) ( ( ) 1| ( )) 
= β 
= = 
r r 
=Σ = = ∗ = =Σ 
P n n P n cpn 
( β ( ) 1| α ( ) α ) ( α ( ) α 
) ( ) 
i i i i 
i i 
1 1 
= = 
The only class of reinforcement schemes for which 
necessary and sufficient conditions of design are 
available is represented by absolutely expedient learning 
schemes, defined in [7]. An automaton is absolutely 
expedient if M(n +1) < M(n) for all n ([7]). 
The general solution for absolutely expedient schemes 
was found by Lakshmivarahan and Thathachar in [4]. 
Other studies about expedient learning algorithms can be 
found in [8]. 
In [17] is presented a nonlinear absolutely expedient 
reinforcement scheme, for a stationary N-teacher P-model 
environment. In the case of N-teacher model, if 
the automaton produced the action i α 
and the responses 
from environments (or “teachers”) are denoted by 
j j N 
β i =1,..., , then the updating rules are: 
⎤ 
( 1) ( ) 1 β φ ( ( )) 
Σ Σ 
≠ = 
= 
− ∗ ⎥⎦ 
⎡ 
⎢⎣ 
+ = + 
r 
j 
j i 
j 
N 
k 
k 
i i i p n 
N 
p n p n 
1 1 
⎤ 
1 1 β ψ ( ( )) (1) 
Σ Σ 
≠ = 
= 
∗ ⎥⎦ 
⎡ 
− − 
⎢⎣ 
r 
j 
j i 
j 
N 
k 
k 
i p n 
N 1 1 
( 1) ( ) 1 ( ( )) 
+ = − ⎡ ⎤ ∗ + ⎢⎣ ⎥⎦ 
p n p n p n 
1 
1 1 ( ( )), . 
⎡ ⎤ 
+ ⎢ − ⎥ ∗ ∀ ≠ ⎣ 1 
⎦ 
N 
k 
j j i j 
k 
N 
k 
i j 
k 
N 
p n j i 
N 
β φ 
β ψ 
= 
= 
Σ 
Σ 
(2) 
i φ 
and i ψ 
satisfy the following conditions: 
p n 
p n 
r λ 
1 = = = p n ≤ 
( ( )) 0 
( ( )) 
( ) 
... 
( ( )) 
1 
( ) 
p n 
p n 
r 
φ φ 
(3) 
p n 
p n 
r μ 
1 = = = p n ≤ 
( ( )) 0 
( ( )) 
( ) 
... 
( ( )) 
1 
( ) 
p n 
p n 
r 
ψ ψ 
(4) 
r 
Σ 
pi n j p n 
( ) + φ ( ( )) > 
0 (5) 
i j j 
1 
≠ = 
r 
Σ 
pi n j p n 
( ) − ψ ( ( )) < 
1 (6) 
i j j 
1 
≠ = 
p j (n) +ψ j ( p(n)) > 0 (7) 
p j (n) −φ j ( p(n)) <1 (8) 
for all j∈{1,..., r}  {i} 
In [1] and [15] is proved that the automaton with the 
reinforcement scheme given in (1)-(2) is absolutely 
expedient in a stationary environment if the functions 
λ ( p(n)) and μ ( p(n)) satisfy the following conditions: 
λ ( p(n)) ≤ 0 
μ ( p(n)) ≤ 0 (9) 
λ ( p(n)) + μ ( p(n)) < 0 
3 Generic absolutely expedient 
reinforcement scheme 
In the following we present a generic two-parameter 
dependent reinforcement schemes and prove that this 
scheme is absolutely expedient in a stationary 
environment. We start from the scheme given in (1) – 
(2). This scheme is also valid for a single-teacher model. 
In this case we will define a single environment response 
denoted by f . 
Thus, the updating rules become: 
pn pn f Hn pn 
( 1) ( ) ( ( )) [1 ( )] 
(1 ) ( ) [1 ( )] 
+ = + ∗ − ∗ ∗ − − 
i i 1 
i 
f pn 
2 
i 
γ 
γ 
− − ∗ − ∗ − 
pn pn f Hn 
p n f p n 
( 1) ( ) ( ( )) 
( ) (1 ) ( ) ( ) 
+ = − ∗ − γ 
∗ ∗ 
1 
∗ + − ∗ − ∗ 
γ 
2 
j j 
j j 
(10) 
for all j ≠ i , i.e.: 
2 ( ( )) ( ) k k ψ p n = −γ ∗ p n 
1 ( ( )) ( ) ( ) k k φ p n = −γ ∗H n ∗ p n 
ISBN: 978-960-474-297-4 333
Proceedings of the European Computing Conference 
where learning parameters 1 γ and 2 γ are real values 
γ 1,γ 2 ∈(0,1) (11) 
The function H is defined as: 
⎧ 
( ) min 1; max min ( ) , 
{ { 
= ⎨ i 
− ⎩ ∗ (1 − 
( )) 
1 
i 
H n p n 
p n 
ε 
γ 
}} 
⎛ ⎫ 1 − () 
⎞ ⎪ ⎜⎜ − ∗ ⎟⎟ ⎬ ⎝ 1 ⎠ 1, 
⎪⎭ 
;0 
( ) 
j 
j j r 
j i 
p n 
p n 
ε 
γ =≠ 
Parameter ε is an arbitrarily small positive real number. 
Our reinforcement scheme differs from schemes given in 
[15]-[17], by the definition of H and φ k . 
We will show that are satisfied all the conditions of the 
reinforcement scheme (1) - (2). 
From (3), (4) we have: 
p n H n p n Hn pn 
p n p n 
( ( )) ( ) ( ) ( ) ( ( )) 
( ) ( ) 
φ γ 
− ∗ ∗ 
= 1 
=− ∗ = 1 
(3’) 
k k 
k k 
γ λ 
p n p n p n 
p n p n 
( ( )) ( ) ( ( )) 
( ) ( ) 
ψ γ 
− ∗ 
= 2 
=− = (4’) 
2 
k k 
k k 
γ μ 
The conditions (5) – (8) become: 
p n H n p n 
H n p n 
( ) ( ) (1 ( )) 0 
( ) ( ) 
− ∗ ∗ − > ⇔ 
< 
i 1 
i 
1 
i 
p n 
(1 ( )) 
i 
γ 
γ 
∗ − 
(5’) 
Condition (5’) is satisfied by the definition of the 
function H(n) . 
2 ( ) (1 ( )) 1 i i p n +γ ∗ − p n < (6’) 
But 2 ( ) (1 ( )) ( ) 1 ( ) 1 i i i i p n +γ ∗ − p n < p n + − p n = 
since 2 0 <γ <1 
2 ( ) ( ) 0, {1,..., }{ } j j p n −γ ∗ p n > ∀j∈ r i (7’) 
But 2 2 ( ) ( ) ( ) (1 ) 0 j j j p n −γ ∗ p n = p n ∗ −γ > 
since 2 0 <γ <1 and 0 < p j (n) <1 for all 
j∈{1,..., r}{i} 
1 
p n 
1 − 
() 
1 
( ) ( ) ( ) 1 ( ) 
( ) 
j 
j j 
j 
p n H n p n H n 
p n 
γ 
γ 
+ ∗ ∗ < ⇔ < 
∗ 
(8’) 
∀j∈{1,..., r}{i} . 
This condition is satisfied by the definition of the 
function H(n) . 
Therefore our reinforcement scheme is a candidate for 
absolute expediency. 
Furthermore, the functions λ and μ for our nonlinear 
scheme satisfy: 
1 λ ( p(n)) = −γ ∗H(n) ≤ 0 
2 μ ( p(n)) = −γ ≤ 0 
1 2 λ ( p(n)) +μ ( p(n)) = −γ ∗H(n) −γ < 0 
In conclusion, the algorithm given in equations (10) is 
absolutely expedient in a stationary environment. This 
algorithm defines a two-parameter dependent generic 
absolutely expedient reinforcement scheme. We will 
denote this scheme by 2 
1 Rγ 
γ . Choosing different 
expressions for the parameters, such that (11) holds, we 
obtained several absolutely expedient reinforcement 
schemes. 
In [9] we introduced and studied the scheme *(1 ) 
(1 )* Rθ δ 
− 
− θ δ 
, 
with 0 <θ <1 and 0 <δ <1. Obviously 
0 <θ *(1−δ ) <1 and 0 < (1−θ )*δ <1, therefore this 
is a absolutely expedient reinforcement scheme. 
In [12] we introduced the scheme θ 
Rθ *δ , with 0 <θ <1 
and 0 <θ ∗δ <1. 
4. Optimization of two-parameters 
reinforcement schemes 
A very important problem is to find the optimal values 
of learning parameters in the scheme 2 
1 Rγ 
γ in order to 
reach the best performance. In [13], we introduced first 
the idea of learning parameters optimization in a 
reinforcement scheme using genetic algorithms. We 
develop here this idea and use a Breeder genetic 
algorithm, for providing the optimal learning parameters 
for the generic scheme 2 
1 Rγ 
γ . We also apply the method 
for the particular schemes presented in section 3. 
Furthermore, we compare our results in terms of speed 
and efficiency. For the simplicity of notations, we 
consider, in our comparisons, the scheme Rθ 
δ , with 
1 Rγ 
δ ,θ ∈(0,1) instead of 2 
γ . 
The aim is to find optimal values for the learning 
parameters δ and θ in the schemes: Rθ 
δ , *(1 ) 
(1 )* Rθ δ 
− 
and 
− θ δ 
θ 
Rθ *δ . 
Because parameters are real values, we use the Breeder 
genetic algorithm, proposed by Mühlenbein and 
Schlierkamp-Voosen in [6], which represents solutions 
(chromosomes) as vectors of real numbers. This 
algorithm is closer to the reality than normal genetic 
algorithms which use discrete representation of 
solutions. The skeleton of the Breeder genetic algorithm 
can be found in [13]. The selection is achieved randomly 
from the T% best elements of current population, where 
T is a constant of the algorithm (usually, T = 40 provide 
best results). Thus, within each generation, two elements 
selected from the T% best chromosomes are subject to 
crossover operation. On the new child obtained from the 
mate of the parents is applied the mutation operator. The 
ISBN: 978-960-474-297-4 334
Proceedings of the European Computing Conference 
process is repeated until are obtained N-1 new 
individuals, where N represents the size of the initial 
population. The best chromosome (evaluated through 
fitness function) is inserted in the new population (1- 
elitism). Thus, the new population will have also N 
elements. 
Let be 1,..., { }i i n x x = = and 1,..., { }i i n y y = = two 
chromosomes. The Breeder crossover operator gives a 
new chromosome z, whose genes are represented by 
( ) i i i i i z x y x α = + − , i=1,…,n, with i α 
a random 
variable uniformly distributed between [−ε ,1+ε ], ε 
depends on the problem to be solved and typically is in 
the interval [0,0.5] . 
The mutation operator gives i i i i i x = x + s ⋅ r ⋅a , i=1,…n 
with { 1,1} i s ∈ − uniform at random, 
i xi r = r ⋅ domain , 
r∈[0.1, 0.5] (typically 0.1) , 2 k 
i a = − ⋅α withα ∈[0,1] 
uniform at random and k is the number of bytes used to 
represent a number in the machine within is executed the 
Breeder algorithm (mutation precision). 
The probability of mutation is typically choosed as 1/ n . 
In order to find the best values for learning parameters 
δ and θ of our reinforcement schemes and to compare 
the results, we consider the same example we used in 
[9], [13]. We used our reinforcement schemes for robot 
navigation in the grid world presented in Fig. 1. The 
current position of the robot is marked by a circle. 
Navigation is done using four actionsα ={N, S, E,W} , 
corresponding to the four possible movements along the 
coordinate directions. 
Fig. 1. Grid world for robot navigation 
We have a single optimal action (movement to S). In the 
learning process, only this action receives reward. 
Initially, we choose for the optimal action a small 
probability value (0.0005). We stop the execution when 
the probability of the optimal action, popt, reaches a 
certain value (popt=0.9999). 
We make the performance evaluation of our schemes 
using the “number of steps” of the learning algorithm 
until the stop condition is achieved. 
Using the Breeder genetic algorithm, we can provide the 
optimal learning parameters for our schemes, in order to 
reach the best performance. 
Each chromosome contains two genes, representing the 
real values δ and θ . The fitness function for 
chromosomes evaluation is represented by the number of 
steps necessary by the learning process to reach the 
value 0.9999 for the probability of the optimal action. 
The parameters of Breeder algorithm are assigned with 
following values: δ = 0 , r = 0.1, k = 8 . The initial 
population has 400 chromosomes and algorithm is 
stopped after 1000 generations. 
The results provided by the Breeder genetic algorithm 
are presented in Table 1. 
Optimal values for learning parameters 
provided by the Breeder algorithm 
4 actions with 
(0) 0.0005, 
(0) = 
0.9995 / 3 
= 
opt 
p 
i≠opt 
p 
Scheme 4.1 
θ 
Rδ 
Scheme 4.2 
θ 
Rθ *δ 
Scheme 4.3 
*(1 ) 
(1 )* 
θ − 
δ 
R − 
θ δ 
δ 0.5866 0.7036 0.5741 
θ 0.9469 0.8983 0.3640 
Average 
number of 
steps to reach 
16.95 16.98 43.70 
popt=0.9999 
Table 1. Optimal values for learning parameters 
provided by the Breeder genetic algorithm 
Fig. 2. Schema optimization vs. time passed 
In figure 2 is presented the optimization process for 
reinforcement schemes analyzed in Table 1, using two 
dimensions of data: the time passed vs. the performance 
evaluation of optimized scheme (number of steps 
necessary to reach the stop conditions of the learning 
process). 
In figure 3 is presented the optimization process using as 
dimensions of data the number of generations in the 
Breeder algorithm vs. the performance evaluation of 
optimized scheme. 
ISBN: 978-960-474-297-4 335
Proceedings of the European Computing Conference 
Fig. 3. Schema optimization vs. number of generations 
in Breeder algorithm 
With results obtained in Table 1, we can conclude that 
Breeder genetic algorithm is capable to provide the best 
values for learning parameters, and thus our schemes 
were optimized for best performance. The results 
obtained by our nonlinear optimized schemes are 
significant better than those obtained in [10], [12], [17]. 
5 Conclusions 
Using a Breeder genetic algorithm, we found 
automatically the optimal values for the learning 
parameters of many reinforcement schemes, in order to 
reach the best performance, measured in number of 
iterations in learning process (“number of steps”). 
From graphical results of optimization process showed 
in Fig.2 and Fig. 3, we can conclude that scheme 4.3, 
R θ *(1 − 
δ 
) 
(1 )* 
, − θ δ 
is more adequate for applications with less time 
allocated for schema optimization, and scheme 4.2, 
Rθ 
, θ *δ is very efficient if we allocate for optimization 
enough time. However, the new generic scheme θ 
Rδ , 
introduced in section 3, outperforms the other schemes 
in terms of speed and qualitative results in the learning 
process. 
There are many possibilities for choosing the form of 
parameters in generic scheme 2 
1 Rγ 
γ such that the 
conditions (11) are satisfied. Breeder genetic algorithm, 
presented in section 4, can be used for optimization of 
parameters values regardless of choice of 1 2 γ ,γ . 
The graphical results obtained suggest than 
1 2 γ =δ ,γ =θ , with 0 <δ ,θ <1 give better results 
than other more complicated choices. As further 
directions of study we want to rigorous prove or to 
invalidate this conjecture. 
References: 
[1] N. Baba, New Topics in Learning Automata: Theory 
and Applications, Lecture Notes in Control and 
Information Sciences, Berlin, Germany: Springer- 
Verlag, pp.750-758, 1984. 
[2] O. Buffet, A. Dutech, and F. Charpillet, Incremental 
reinforcement learning for designing multi-agent 
systems, In J. P. Müller, E. Andre, S. Sen, and C. 
Frasson, editors, Proceedings of the Fifth International 
Conference onAutonomous Agents, Montreal, Canada, 
ACM Press, pp. 31–38, 2001. 
[3] M. Dorigo, Introduction to the Special Issue on 
Learning Autonomous Robots, IEEE Trans. on Systems, 
Man and Cybernetics - part B, Vol. 26, No. 3, pp. 361- 
364, 1996. 
[4] S. Lakshmivarahan, M.A.L. Thathachar, Absolutely 
Expedient Learning Algorithms for Stochastic 
Automata, IEEE Transactions on Systems, Man and 
Cybernetics, vol. SMC-6, pp. 281-286, 1973. 
[5] J. Moody, Y. Liu, M. Saffell, and K. Youn. 
Stochastic direct reinforcement: Application to simple 
games with recurrence, In Proceedings of Artificial 
Multiagent Learning. Papers from the 2004 AAAI Fall 
Symposium, Technical Report FS-04-02. 
[6] H. Mühlenbein, D. Schlierkamp-Voosen, The science 
of breeding and its application to the breeder genetic 
algorithm, Evolutionary Computation, vol. 1, pp. 335- 
360, 1994. 
[7] K. S. Narendra, M. A. L. Thathachar, Learning 
Automata: an introduction, Prentice-Hall, 1989. 
[8] C. Rivero, Characterization of the absolutely 
expedient learning algorithms for stochastic automata in 
a non-discrete space of actions, ESANN'2003 
proceedings - European Symposium on Artificial Neural 
Networks Bruges (Belgium), pp. 307-312, 2003. 
[9] D. Simian, F. Stoica, A New Nonlinear 
Reinforcement Scheme for Stochastic Learning 
Automata, Proceedings of the 12th WSEAS International 
Conference on Automatic Control, Modelling & 
Simulation, Catania, Italy, pp. 450-454, 2010. 
[10] F. Stoica, E. M. Popa, An Absolutely Expedient 
Learning Algorithm for Stochastic Automata, WSEAS 
Transactions on Computers, Issue 2, Volume 6, pp. 229- 
235, 2007. 
[11] F. Stoica, D. Simian, Automatic control based on 
Wasp Behavioral Model and Stochastic Learning 
Automata. Mathematics and Computers in Science and 
Engineering Series, Proceedings of 10th WSEAS 
Conference on Mathematical Methods, Computational 
ISBN: 978-960-474-297-4 336
Proceedings of the European Computing Conference 
Techniques and Intelligent Systems (MAMECTIS '08), 
Corfu 2008, WSEAS Press, pp. 289-295, 2008. 
[12] F. Stoica, E. M. Popa, I. Pah, A new reinforcement 
scheme for stochastic learning automata – Application to 
Automatic Control, Proceedings of the International 
Conference on e-Business, Porto, Portugal, pp. 45-50, 
2008. 
[13] F. Stoica, D. Simian, Optimizing a New Nonlinear 
Reinforcement Scheme with Breeder genetic algorithm, 
Proceedings of the 11th WSEAS International 
Conference on Evolutionary Computing (EC'10), Iaşi, 
Romania, pp. 273-278, 2010. 
[14] R. Sutton, A. Barto, Reinforcement learning: An 
introduction, MIT-press, Cambridge, MA, 1998. 
[15] C. Ünsal, P. Kachroo, J. S. Bay, Simulation Study 
of Learning Automata Games in Automated Highway 
Systems, 1st IEEE Conference on Intelligent 
Transportation Systems (ITSC’97), Boston, 
Massachusetts, 1997 
[16] C. Ünsal, P. Kachroo, J. S. Bay, Simulation Study 
of Multiple Intelligent Vehicle Control using Stochastic 
Learning Automata, IEEE Transactions on Systems, 
Man and Cybernetics – Part A, Systems and Human, 
pp.1-42, 1997. 
[17] C. Ünsal, Intelligent Navigation of Autonomous 
Vehicles in an Automated Highway System: Learning 
Methods and Interacting Vehicles Approach, dissertation 
thesis, Pittsburg University, Virginia, USA, 1997. 
ISBN: 978-960-474-297-4 337

Weitere ähnliche Inhalte

Was ist angesagt?

Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsSEMINARGROOT
 
A Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeA Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeChung Hua Universit
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMCSEMINARGROOT
 
Numerical solution of fuzzy differential equations by Milne’s predictor-corre...
Numerical solution of fuzzy differential equations by Milne’s predictor-corre...Numerical solution of fuzzy differential equations by Milne’s predictor-corre...
Numerical solution of fuzzy differential equations by Milne’s predictor-corre...mathsjournal
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics JCMwave
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsJCMwave
 
Numerical
NumericalNumerical
Numerical1821986
 
Introductory maths analysis chapter 14 official
Introductory maths analysis   chapter 14 officialIntroductory maths analysis   chapter 14 official
Introductory maths analysis chapter 14 officialEvert Sandye Taasiringan
 
Time series analysis, modeling and applications
Time series analysis, modeling and applicationsTime series analysis, modeling and applications
Time series analysis, modeling and applicationsSpringer
 
The algebraic techniques module4
The algebraic techniques module4The algebraic techniques module4
The algebraic techniques module4REYEMMANUELILUMBA
 
Gamma & Beta functions
Gamma & Beta functionsGamma & Beta functions
Gamma & Beta functionsSelvaraj John
 
Introductory maths analysis chapter 17 official
Introductory maths analysis   chapter 17 officialIntroductory maths analysis   chapter 17 official
Introductory maths analysis chapter 17 officialEvert Sandye Taasiringan
 
Introductory maths analysis chapter 10 official
Introductory maths analysis   chapter 10 officialIntroductory maths analysis   chapter 10 official
Introductory maths analysis chapter 10 officialEvert Sandye Taasiringan
 
Numerical Methods
Numerical MethodsNumerical Methods
Numerical MethodsTeja Ande
 
HMM-Based Speech Synthesis
HMM-Based Speech SynthesisHMM-Based Speech Synthesis
HMM-Based Speech SynthesisIJMER
 

Was ist angesagt? (18)

Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence Functions
 
A Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeA Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter Three
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMC
 
Numerical solution of fuzzy differential equations by Milne’s predictor-corre...
Numerical solution of fuzzy differential equations by Milne’s predictor-corre...Numerical solution of fuzzy differential equations by Milne’s predictor-corre...
Numerical solution of fuzzy differential equations by Milne’s predictor-corre...
 
SIAMSEAS2015
SIAMSEAS2015SIAMSEAS2015
SIAMSEAS2015
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
Finite frequency H∞ control for wind turbine systems in T-S form
Finite frequency H∞ control for wind turbine systems in T-S formFinite frequency H∞ control for wind turbine systems in T-S form
Finite frequency H∞ control for wind turbine systems in T-S form
 
Numerical
NumericalNumerical
Numerical
 
Introductory maths analysis chapter 14 official
Introductory maths analysis   chapter 14 officialIntroductory maths analysis   chapter 14 official
Introductory maths analysis chapter 14 official
 
Time series analysis, modeling and applications
Time series analysis, modeling and applicationsTime series analysis, modeling and applications
Time series analysis, modeling and applications
 
The algebraic techniques module4
The algebraic techniques module4The algebraic techniques module4
The algebraic techniques module4
 
RTSP Report
RTSP ReportRTSP Report
RTSP Report
 
Gamma & Beta functions
Gamma & Beta functionsGamma & Beta functions
Gamma & Beta functions
 
Introductory maths analysis chapter 17 official
Introductory maths analysis   chapter 17 officialIntroductory maths analysis   chapter 17 official
Introductory maths analysis chapter 17 official
 
Introductory maths analysis chapter 10 official
Introductory maths analysis   chapter 10 officialIntroductory maths analysis   chapter 10 official
Introductory maths analysis chapter 10 official
 
Numerical Methods
Numerical MethodsNumerical Methods
Numerical Methods
 
HMM-Based Speech Synthesis
HMM-Based Speech SynthesisHMM-Based Speech Synthesis
HMM-Based Speech Synthesis
 

Andere mochten auch

A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsinfopapers
 
A new Reinforcement Scheme for Stochastic Learning Automata
A new Reinforcement Scheme for Stochastic Learning AutomataA new Reinforcement Scheme for Stochastic Learning Automata
A new Reinforcement Scheme for Stochastic Learning Automatainfopapers
 
A Distributed CTL Model Checker
A Distributed CTL Model CheckerA Distributed CTL Model Checker
A Distributed CTL Model Checkerinfopapers
 
Algebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model CheckerAlgebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model Checkerinfopapers
 
Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...infopapers
 
An Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine LanguageAn Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine Languageinfopapers
 
Modeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI AgentModeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI Agentinfopapers
 
A new co-mutation genetic operator
A new co-mutation genetic operatorA new co-mutation genetic operator
A new co-mutation genetic operatorinfopapers
 
Using the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis ModelUsing the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis Modelinfopapers
 
Intelligent agents in ontology-based applications
Intelligent agents in ontology-based applicationsIntelligent agents in ontology-based applications
Intelligent agents in ontology-based applicationsinfopapers
 
Implementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra conceptsImplementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra conceptsinfopapers
 
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...infopapers
 
A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
A new Evolutionary Reinforcement Scheme for Stochastic Learning AutomataA new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
A new Evolutionary Reinforcement Scheme for Stochastic Learning Automatainfopapers
 
An AsmL model for an Intelligent Vehicle Control System
An AsmL model for an Intelligent Vehicle Control SystemAn AsmL model for an Intelligent Vehicle Control System
An AsmL model for an Intelligent Vehicle Control Systeminfopapers
 
Building a new CTL model checker using Web Services
Building a new CTL model checker using Web ServicesBuilding a new CTL model checker using Web Services
Building a new CTL model checker using Web Servicesinfopapers
 
Deliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE ApplicationsDeliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE Applicationsinfopapers
 
Building a Web-bridge for JADE agents
Building a Web-bridge for JADE agentsBuilding a Web-bridge for JADE agents
Building a Web-bridge for JADE agentsinfopapers
 

Andere mochten auch (17)

A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernels
 
A new Reinforcement Scheme for Stochastic Learning Automata
A new Reinforcement Scheme for Stochastic Learning AutomataA new Reinforcement Scheme for Stochastic Learning Automata
A new Reinforcement Scheme for Stochastic Learning Automata
 
A Distributed CTL Model Checker
A Distributed CTL Model CheckerA Distributed CTL Model Checker
A Distributed CTL Model Checker
 
Algebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model CheckerAlgebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model Checker
 
Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...
 
An Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine LanguageAn Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine Language
 
Modeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI AgentModeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI Agent
 
A new co-mutation genetic operator
A new co-mutation genetic operatorA new co-mutation genetic operator
A new co-mutation genetic operator
 
Using the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis ModelUsing the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis Model
 
Intelligent agents in ontology-based applications
Intelligent agents in ontology-based applicationsIntelligent agents in ontology-based applications
Intelligent agents in ontology-based applications
 
Implementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra conceptsImplementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra concepts
 
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
 
A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
A new Evolutionary Reinforcement Scheme for Stochastic Learning AutomataA new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
 
An AsmL model for an Intelligent Vehicle Control System
An AsmL model for an Intelligent Vehicle Control SystemAn AsmL model for an Intelligent Vehicle Control System
An AsmL model for an Intelligent Vehicle Control System
 
Building a new CTL model checker using Web Services
Building a new CTL model checker using Web ServicesBuilding a new CTL model checker using Web Services
Building a new CTL model checker using Web Services
 
Deliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE ApplicationsDeliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE Applications
 
Building a Web-bridge for JADE agents
Building a Web-bridge for JADE agentsBuilding a Web-bridge for JADE agents
Building a Web-bridge for JADE agents
 

Ähnlich wie Generic Reinforcement Schemes and Their Optimization

ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...ijscai
 
ADAPTIVESYNCHRONIZER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZH...
ADAPTIVESYNCHRONIZER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZH...ADAPTIVESYNCHRONIZER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZH...
ADAPTIVESYNCHRONIZER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZH...ijitcs
 
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...infopapers
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijcisjournal
 
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...ijics
 
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...TELKOMNIKA JOURNAL
 
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...Zac Darcy
 
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...ijistjournal
 
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...ijistjournal
 
ADAPTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU ...
ADAPTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU ...ADAPTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU ...
ADAPTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU ...ijait
 
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...Zac Darcy
 
PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfssusera1eccd
 
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM cseij
 
Adaptive Stabilization and Synchronization of Hyperchaotic QI System
Adaptive Stabilization and Synchronization of Hyperchaotic QI SystemAdaptive Stabilization and Synchronization of Hyperchaotic QI System
Adaptive Stabilization and Synchronization of Hyperchaotic QI SystemCSEIJJournal
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Reliability Importance in Weighted-k-out-of-n Systems
Reliability Importance in Weighted-k-out-of-n SystemsReliability Importance in Weighted-k-out-of-n Systems
Reliability Importance in Weighted-k-out-of-n Systemsbozbulut1
 
Adaptive Control Scheme with Parameter Adaptation - From Human Motor Control ...
Adaptive Control Scheme with Parameter Adaptation - From Human Motor Control ...Adaptive Control Scheme with Parameter Adaptation - From Human Motor Control ...
Adaptive Control Scheme with Parameter Adaptation - From Human Motor Control ...toukaigi
 
The Generalized Difference Operator of the 퐧 퐭퐡 Kind
The Generalized Difference Operator of the 퐧 퐭퐡 KindThe Generalized Difference Operator of the 퐧 퐭퐡 Kind
The Generalized Difference Operator of the 퐧 퐭퐡 KindDr. Amarjeet Singh
 
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC WANG AND HYPERCHAOTIC LI SYSTEMS WITH UN...
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC WANG AND HYPERCHAOTIC LI SYSTEMS WITH UN...ANTI-SYNCHRONIZATION OF HYPERCHAOTIC WANG AND HYPERCHAOTIC LI SYSTEMS WITH UN...
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC WANG AND HYPERCHAOTIC LI SYSTEMS WITH UN...ijcseit
 

Ähnlich wie Generic Reinforcement Schemes and Their Optimization (20)

ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
 
ADAPTIVESYNCHRONIZER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZH...
ADAPTIVESYNCHRONIZER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZH...ADAPTIVESYNCHRONIZER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZH...
ADAPTIVESYNCHRONIZER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZH...
 
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...
 
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
 
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...
 
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
 
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
 
ADAPTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU ...
ADAPTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU ...ADAPTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU ...
ADAPTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU ...
 
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
 
PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdf
 
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM
 
Adaptive Stabilization and Synchronization of Hyperchaotic QI System
Adaptive Stabilization and Synchronization of Hyperchaotic QI SystemAdaptive Stabilization and Synchronization of Hyperchaotic QI System
Adaptive Stabilization and Synchronization of Hyperchaotic QI System
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
A0280106
A0280106A0280106
A0280106
 
Reliability Importance in Weighted-k-out-of-n Systems
Reliability Importance in Weighted-k-out-of-n SystemsReliability Importance in Weighted-k-out-of-n Systems
Reliability Importance in Weighted-k-out-of-n Systems
 
Adaptive Control Scheme with Parameter Adaptation - From Human Motor Control ...
Adaptive Control Scheme with Parameter Adaptation - From Human Motor Control ...Adaptive Control Scheme with Parameter Adaptation - From Human Motor Control ...
Adaptive Control Scheme with Parameter Adaptation - From Human Motor Control ...
 
The Generalized Difference Operator of the 퐧 퐭퐡 Kind
The Generalized Difference Operator of the 퐧 퐭퐡 KindThe Generalized Difference Operator of the 퐧 퐭퐡 Kind
The Generalized Difference Operator of the 퐧 퐭퐡 Kind
 
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC WANG AND HYPERCHAOTIC LI SYSTEMS WITH UN...
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC WANG AND HYPERCHAOTIC LI SYSTEMS WITH UN...ANTI-SYNCHRONIZATION OF HYPERCHAOTIC WANG AND HYPERCHAOTIC LI SYSTEMS WITH UN...
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC WANG AND HYPERCHAOTIC LI SYSTEMS WITH UN...
 

Mehr von infopapers

A New Model Checking Tool
A New Model Checking ToolA New Model Checking Tool
A New Model Checking Toolinfopapers
 
CTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR ToolsCTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR Toolsinfopapers
 
Generating JADE agents from SDL specifications
Generating JADE agents from SDL specificationsGenerating JADE agents from SDL specifications
Generating JADE agents from SDL specificationsinfopapers
 
An evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernelsAn evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernelsinfopapers
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
 
Interoperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web ServicesInteroperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web Servicesinfopapers
 
Using Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning SystemsUsing Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning Systemsinfopapers
 
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...infopapers
 
An executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control SystemAn executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control Systeminfopapers
 

Mehr von infopapers (9)

A New Model Checking Tool
A New Model Checking ToolA New Model Checking Tool
A New Model Checking Tool
 
CTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR ToolsCTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR Tools
 
Generating JADE agents from SDL specifications
Generating JADE agents from SDL specificationsGenerating JADE agents from SDL specifications
Generating JADE agents from SDL specifications
 
An evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernelsAn evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernels
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
 
Interoperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web ServicesInteroperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web Services
 
Using Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning SystemsUsing Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning Systems
 
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
 
An executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control SystemAn executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control System
 

Kürzlich hochgeladen

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 

Kürzlich hochgeladen (20)

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 

Generic Reinforcement Schemes and Their Optimization

  • 1. Proceedings of the European Computing Conference Generic Reinforcement Schemes and Their Optimization DANA SIMIAN, FLORIN STOICA Department of Informatics “Lucian Blaga” University of Sibiu Str. Dr. Ion Ratiu 5-7, 550012, Sibiu ROMANIA dana.simian@ulbsibiu.ro, florin.stoica@ulbsibiu.ro Abstract: - The aim of this paper is to introduce a generic two-parameters dependent absolutely expedient reinforcement scheme and to present a method for learning parameters optimization. We optimize, using a Breeder genetic algorithm, many schemes derived from our generic one, in order to reach the best performance. Furthermore, we compare our results in terms of speed and efficiency. Key-Words: - Reinforcement Learning, Breeder genetic algorithm, Optimization 1 Introduction Reinforcement schemes represent algorithms which realize the learning process for stochastic learning automata. Stochastic learning automata adapt to changes in their environment as results of a reinforcement learning process. Given a set of possible actions, a stochastic learning automaton must choose the optimal one, based on the environment response and the past actions. Initially equal probabilities are associated to all possible actions, one action is selected at random and the actions probabilities are updated based on the environment response. A detailed characterization of reinforcement learning can be found in [14]. In [7] is underlined that the major advantage of reinforcement learning is that it needs information about the environment only for the reinforcement signal. Reinforcement learning has several applications in autonomic robotics, designing multi-agent systems, intelligent vehicles control, etc. ([2], [3], [5], [15], [16]). In [11] we designed a simulator of an intelligent vehicle control system. The system was based on two learning automata. In other articles ([9], [13]), we defined new reinforcement schemes in order to reach a best performance of our system. Usually, reinforcement schemes depends on many parameters, as we can see in section 3. An important problem is to choose the optimal scheme’s parameters. The aim of this paper is to introduce a generic reinforcement scheme from which many other reinforcement schems can be obtained and to present an optimization method of these schemes with respect to learning parameters. We also optimize this new scheme and those that we introduced in [9], [12]. We evaluate and compare our schemes using two criteria: the speed of the optimization process and the efficiency of the optimized schema. The remainder of this paper is organized as follows. In section 2 we briefly present the mathematical backgrounds of stochastic learning automata with variable structure. In section 3 is presented our generic absolutely expedient reinforcement scheme, together with other particular schemes derived from it. In section 3 we present our optimization method for reinforcement schemes learning parameters and analyze the provided results. Conclusions and further directions of study are presented in section 5. 2 Mathematical backgrounds of stochastic automata A stochastic automaton supposes the existence of a set of actions, which define the input of the environment and a response set. The range of the response values depends on the model we chose. There are three different models for representation of the response values: P-model, S-model and Q-model. The P-model uses a set of binary values, 0 or 1. In the S-model the response values are continuous in the range (0, 1). In the Q-model the response set is a finite set of discrete values in the range (0, 1). In this paper we use the P-model for our reinforcement schemes. A stochastic automaton selects one action at random, observes the response from the environment and updates the action probabilities based on that response. An action can be rewarded or punished using a set of penalties probabilities. Mathematical model of a stochastic automaton is defined by a triple {α , c,β } corresponding to the elements presented before: a) α ={α1 ,α 2 ,...,α r } - the input actions of the environment b) β - the response set. In the case of P-model, { 1 , 2} β = β β is a binary set: ISBN: 978-960-474-297-4 332
  • 2. Proceedings of the European Computing Conference β = 0 is a favourable outcome and β =1 is an unfavourable outcome. To reefer the time instant is used the notationα (n) , β (n) . c) c ={c1 , c2 ,..., cr } - the set of penalty probabilities. The element i c is the probability that action i α will result in an unfavourable response: ci = P(β (n) =1|α (n) =α i ) i =1, 2, ..., r The evolution in time of penalty probabilities defines two types of environments: stationary (the penalty probabilities are constant over time) and nonstationary (the penalties change over time). In the following we consider only stationary random environments. The action probabilities vector at time moment n+1 is updated using a mapping T and the current probabilities pi (n) = P(α (n) =α i ), i =1, r : p(n +1) = T[ p(n),α (n),β (n)] Reinforcement schemes are named linear if p(n +1) is a linear function of p(n) , and nonlinear otherwise. The evaluation of performances of a learning automaton is made using a quantitative norm of behavior ([17]) represented by the average penalty for a given action probability vector, M(n). M n P n p n ( ) ( ( ) 1| ( )) = β = = r r =Σ = = ∗ = =Σ P n n P n cpn ( β ( ) 1| α ( ) α ) ( α ( ) α ) ( ) i i i i i i 1 1 = = The only class of reinforcement schemes for which necessary and sufficient conditions of design are available is represented by absolutely expedient learning schemes, defined in [7]. An automaton is absolutely expedient if M(n +1) < M(n) for all n ([7]). The general solution for absolutely expedient schemes was found by Lakshmivarahan and Thathachar in [4]. Other studies about expedient learning algorithms can be found in [8]. In [17] is presented a nonlinear absolutely expedient reinforcement scheme, for a stationary N-teacher P-model environment. In the case of N-teacher model, if the automaton produced the action i α and the responses from environments (or “teachers”) are denoted by j j N β i =1,..., , then the updating rules are: ⎤ ( 1) ( ) 1 β φ ( ( )) Σ Σ ≠ = = − ∗ ⎥⎦ ⎡ ⎢⎣ + = + r j j i j N k k i i i p n N p n p n 1 1 ⎤ 1 1 β ψ ( ( )) (1) Σ Σ ≠ = = ∗ ⎥⎦ ⎡ − − ⎢⎣ r j j i j N k k i p n N 1 1 ( 1) ( ) 1 ( ( )) + = − ⎡ ⎤ ∗ + ⎢⎣ ⎥⎦ p n p n p n 1 1 1 ( ( )), . ⎡ ⎤ + ⎢ − ⎥ ∗ ∀ ≠ ⎣ 1 ⎦ N k j j i j k N k i j k N p n j i N β φ β ψ = = Σ Σ (2) i φ and i ψ satisfy the following conditions: p n p n r λ 1 = = = p n ≤ ( ( )) 0 ( ( )) ( ) ... ( ( )) 1 ( ) p n p n r φ φ (3) p n p n r μ 1 = = = p n ≤ ( ( )) 0 ( ( )) ( ) ... ( ( )) 1 ( ) p n p n r ψ ψ (4) r Σ pi n j p n ( ) + φ ( ( )) > 0 (5) i j j 1 ≠ = r Σ pi n j p n ( ) − ψ ( ( )) < 1 (6) i j j 1 ≠ = p j (n) +ψ j ( p(n)) > 0 (7) p j (n) −φ j ( p(n)) <1 (8) for all j∈{1,..., r} {i} In [1] and [15] is proved that the automaton with the reinforcement scheme given in (1)-(2) is absolutely expedient in a stationary environment if the functions λ ( p(n)) and μ ( p(n)) satisfy the following conditions: λ ( p(n)) ≤ 0 μ ( p(n)) ≤ 0 (9) λ ( p(n)) + μ ( p(n)) < 0 3 Generic absolutely expedient reinforcement scheme In the following we present a generic two-parameter dependent reinforcement schemes and prove that this scheme is absolutely expedient in a stationary environment. We start from the scheme given in (1) – (2). This scheme is also valid for a single-teacher model. In this case we will define a single environment response denoted by f . Thus, the updating rules become: pn pn f Hn pn ( 1) ( ) ( ( )) [1 ( )] (1 ) ( ) [1 ( )] + = + ∗ − ∗ ∗ − − i i 1 i f pn 2 i γ γ − − ∗ − ∗ − pn pn f Hn p n f p n ( 1) ( ) ( ( )) ( ) (1 ) ( ) ( ) + = − ∗ − γ ∗ ∗ 1 ∗ + − ∗ − ∗ γ 2 j j j j (10) for all j ≠ i , i.e.: 2 ( ( )) ( ) k k ψ p n = −γ ∗ p n 1 ( ( )) ( ) ( ) k k φ p n = −γ ∗H n ∗ p n ISBN: 978-960-474-297-4 333
  • 3. Proceedings of the European Computing Conference where learning parameters 1 γ and 2 γ are real values γ 1,γ 2 ∈(0,1) (11) The function H is defined as: ⎧ ( ) min 1; max min ( ) , { { = ⎨ i − ⎩ ∗ (1 − ( )) 1 i H n p n p n ε γ }} ⎛ ⎫ 1 − () ⎞ ⎪ ⎜⎜ − ∗ ⎟⎟ ⎬ ⎝ 1 ⎠ 1, ⎪⎭ ;0 ( ) j j j r j i p n p n ε γ =≠ Parameter ε is an arbitrarily small positive real number. Our reinforcement scheme differs from schemes given in [15]-[17], by the definition of H and φ k . We will show that are satisfied all the conditions of the reinforcement scheme (1) - (2). From (3), (4) we have: p n H n p n Hn pn p n p n ( ( )) ( ) ( ) ( ) ( ( )) ( ) ( ) φ γ − ∗ ∗ = 1 =− ∗ = 1 (3’) k k k k γ λ p n p n p n p n p n ( ( )) ( ) ( ( )) ( ) ( ) ψ γ − ∗ = 2 =− = (4’) 2 k k k k γ μ The conditions (5) – (8) become: p n H n p n H n p n ( ) ( ) (1 ( )) 0 ( ) ( ) − ∗ ∗ − > ⇔ < i 1 i 1 i p n (1 ( )) i γ γ ∗ − (5’) Condition (5’) is satisfied by the definition of the function H(n) . 2 ( ) (1 ( )) 1 i i p n +γ ∗ − p n < (6’) But 2 ( ) (1 ( )) ( ) 1 ( ) 1 i i i i p n +γ ∗ − p n < p n + − p n = since 2 0 <γ <1 2 ( ) ( ) 0, {1,..., }{ } j j p n −γ ∗ p n > ∀j∈ r i (7’) But 2 2 ( ) ( ) ( ) (1 ) 0 j j j p n −γ ∗ p n = p n ∗ −γ > since 2 0 <γ <1 and 0 < p j (n) <1 for all j∈{1,..., r}{i} 1 p n 1 − () 1 ( ) ( ) ( ) 1 ( ) ( ) j j j j p n H n p n H n p n γ γ + ∗ ∗ < ⇔ < ∗ (8’) ∀j∈{1,..., r}{i} . This condition is satisfied by the definition of the function H(n) . Therefore our reinforcement scheme is a candidate for absolute expediency. Furthermore, the functions λ and μ for our nonlinear scheme satisfy: 1 λ ( p(n)) = −γ ∗H(n) ≤ 0 2 μ ( p(n)) = −γ ≤ 0 1 2 λ ( p(n)) +μ ( p(n)) = −γ ∗H(n) −γ < 0 In conclusion, the algorithm given in equations (10) is absolutely expedient in a stationary environment. This algorithm defines a two-parameter dependent generic absolutely expedient reinforcement scheme. We will denote this scheme by 2 1 Rγ γ . Choosing different expressions for the parameters, such that (11) holds, we obtained several absolutely expedient reinforcement schemes. In [9] we introduced and studied the scheme *(1 ) (1 )* Rθ δ − − θ δ , with 0 <θ <1 and 0 <δ <1. Obviously 0 <θ *(1−δ ) <1 and 0 < (1−θ )*δ <1, therefore this is a absolutely expedient reinforcement scheme. In [12] we introduced the scheme θ Rθ *δ , with 0 <θ <1 and 0 <θ ∗δ <1. 4. Optimization of two-parameters reinforcement schemes A very important problem is to find the optimal values of learning parameters in the scheme 2 1 Rγ γ in order to reach the best performance. In [13], we introduced first the idea of learning parameters optimization in a reinforcement scheme using genetic algorithms. We develop here this idea and use a Breeder genetic algorithm, for providing the optimal learning parameters for the generic scheme 2 1 Rγ γ . We also apply the method for the particular schemes presented in section 3. Furthermore, we compare our results in terms of speed and efficiency. For the simplicity of notations, we consider, in our comparisons, the scheme Rθ δ , with 1 Rγ δ ,θ ∈(0,1) instead of 2 γ . The aim is to find optimal values for the learning parameters δ and θ in the schemes: Rθ δ , *(1 ) (1 )* Rθ δ − and − θ δ θ Rθ *δ . Because parameters are real values, we use the Breeder genetic algorithm, proposed by Mühlenbein and Schlierkamp-Voosen in [6], which represents solutions (chromosomes) as vectors of real numbers. This algorithm is closer to the reality than normal genetic algorithms which use discrete representation of solutions. The skeleton of the Breeder genetic algorithm can be found in [13]. The selection is achieved randomly from the T% best elements of current population, where T is a constant of the algorithm (usually, T = 40 provide best results). Thus, within each generation, two elements selected from the T% best chromosomes are subject to crossover operation. On the new child obtained from the mate of the parents is applied the mutation operator. The ISBN: 978-960-474-297-4 334
  • 4. Proceedings of the European Computing Conference process is repeated until are obtained N-1 new individuals, where N represents the size of the initial population. The best chromosome (evaluated through fitness function) is inserted in the new population (1- elitism). Thus, the new population will have also N elements. Let be 1,..., { }i i n x x = = and 1,..., { }i i n y y = = two chromosomes. The Breeder crossover operator gives a new chromosome z, whose genes are represented by ( ) i i i i i z x y x α = + − , i=1,…,n, with i α a random variable uniformly distributed between [−ε ,1+ε ], ε depends on the problem to be solved and typically is in the interval [0,0.5] . The mutation operator gives i i i i i x = x + s ⋅ r ⋅a , i=1,…n with { 1,1} i s ∈ − uniform at random, i xi r = r ⋅ domain , r∈[0.1, 0.5] (typically 0.1) , 2 k i a = − ⋅α withα ∈[0,1] uniform at random and k is the number of bytes used to represent a number in the machine within is executed the Breeder algorithm (mutation precision). The probability of mutation is typically choosed as 1/ n . In order to find the best values for learning parameters δ and θ of our reinforcement schemes and to compare the results, we consider the same example we used in [9], [13]. We used our reinforcement schemes for robot navigation in the grid world presented in Fig. 1. The current position of the robot is marked by a circle. Navigation is done using four actionsα ={N, S, E,W} , corresponding to the four possible movements along the coordinate directions. Fig. 1. Grid world for robot navigation We have a single optimal action (movement to S). In the learning process, only this action receives reward. Initially, we choose for the optimal action a small probability value (0.0005). We stop the execution when the probability of the optimal action, popt, reaches a certain value (popt=0.9999). We make the performance evaluation of our schemes using the “number of steps” of the learning algorithm until the stop condition is achieved. Using the Breeder genetic algorithm, we can provide the optimal learning parameters for our schemes, in order to reach the best performance. Each chromosome contains two genes, representing the real values δ and θ . The fitness function for chromosomes evaluation is represented by the number of steps necessary by the learning process to reach the value 0.9999 for the probability of the optimal action. The parameters of Breeder algorithm are assigned with following values: δ = 0 , r = 0.1, k = 8 . The initial population has 400 chromosomes and algorithm is stopped after 1000 generations. The results provided by the Breeder genetic algorithm are presented in Table 1. Optimal values for learning parameters provided by the Breeder algorithm 4 actions with (0) 0.0005, (0) = 0.9995 / 3 = opt p i≠opt p Scheme 4.1 θ Rδ Scheme 4.2 θ Rθ *δ Scheme 4.3 *(1 ) (1 )* θ − δ R − θ δ δ 0.5866 0.7036 0.5741 θ 0.9469 0.8983 0.3640 Average number of steps to reach 16.95 16.98 43.70 popt=0.9999 Table 1. Optimal values for learning parameters provided by the Breeder genetic algorithm Fig. 2. Schema optimization vs. time passed In figure 2 is presented the optimization process for reinforcement schemes analyzed in Table 1, using two dimensions of data: the time passed vs. the performance evaluation of optimized scheme (number of steps necessary to reach the stop conditions of the learning process). In figure 3 is presented the optimization process using as dimensions of data the number of generations in the Breeder algorithm vs. the performance evaluation of optimized scheme. ISBN: 978-960-474-297-4 335
  • 5. Proceedings of the European Computing Conference Fig. 3. Schema optimization vs. number of generations in Breeder algorithm With results obtained in Table 1, we can conclude that Breeder genetic algorithm is capable to provide the best values for learning parameters, and thus our schemes were optimized for best performance. The results obtained by our nonlinear optimized schemes are significant better than those obtained in [10], [12], [17]. 5 Conclusions Using a Breeder genetic algorithm, we found automatically the optimal values for the learning parameters of many reinforcement schemes, in order to reach the best performance, measured in number of iterations in learning process (“number of steps”). From graphical results of optimization process showed in Fig.2 and Fig. 3, we can conclude that scheme 4.3, R θ *(1 − δ ) (1 )* , − θ δ is more adequate for applications with less time allocated for schema optimization, and scheme 4.2, Rθ , θ *δ is very efficient if we allocate for optimization enough time. However, the new generic scheme θ Rδ , introduced in section 3, outperforms the other schemes in terms of speed and qualitative results in the learning process. There are many possibilities for choosing the form of parameters in generic scheme 2 1 Rγ γ such that the conditions (11) are satisfied. Breeder genetic algorithm, presented in section 4, can be used for optimization of parameters values regardless of choice of 1 2 γ ,γ . The graphical results obtained suggest than 1 2 γ =δ ,γ =θ , with 0 <δ ,θ <1 give better results than other more complicated choices. As further directions of study we want to rigorous prove or to invalidate this conjecture. References: [1] N. Baba, New Topics in Learning Automata: Theory and Applications, Lecture Notes in Control and Information Sciences, Berlin, Germany: Springer- Verlag, pp.750-758, 1984. [2] O. Buffet, A. Dutech, and F. Charpillet, Incremental reinforcement learning for designing multi-agent systems, In J. P. Müller, E. Andre, S. Sen, and C. Frasson, editors, Proceedings of the Fifth International Conference onAutonomous Agents, Montreal, Canada, ACM Press, pp. 31–38, 2001. [3] M. Dorigo, Introduction to the Special Issue on Learning Autonomous Robots, IEEE Trans. on Systems, Man and Cybernetics - part B, Vol. 26, No. 3, pp. 361- 364, 1996. [4] S. Lakshmivarahan, M.A.L. Thathachar, Absolutely Expedient Learning Algorithms for Stochastic Automata, IEEE Transactions on Systems, Man and Cybernetics, vol. SMC-6, pp. 281-286, 1973. [5] J. Moody, Y. Liu, M. Saffell, and K. Youn. Stochastic direct reinforcement: Application to simple games with recurrence, In Proceedings of Artificial Multiagent Learning. Papers from the 2004 AAAI Fall Symposium, Technical Report FS-04-02. [6] H. Mühlenbein, D. Schlierkamp-Voosen, The science of breeding and its application to the breeder genetic algorithm, Evolutionary Computation, vol. 1, pp. 335- 360, 1994. [7] K. S. Narendra, M. A. L. Thathachar, Learning Automata: an introduction, Prentice-Hall, 1989. [8] C. Rivero, Characterization of the absolutely expedient learning algorithms for stochastic automata in a non-discrete space of actions, ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), pp. 307-312, 2003. [9] D. Simian, F. Stoica, A New Nonlinear Reinforcement Scheme for Stochastic Learning Automata, Proceedings of the 12th WSEAS International Conference on Automatic Control, Modelling & Simulation, Catania, Italy, pp. 450-454, 2010. [10] F. Stoica, E. M. Popa, An Absolutely Expedient Learning Algorithm for Stochastic Automata, WSEAS Transactions on Computers, Issue 2, Volume 6, pp. 229- 235, 2007. [11] F. Stoica, D. Simian, Automatic control based on Wasp Behavioral Model and Stochastic Learning Automata. Mathematics and Computers in Science and Engineering Series, Proceedings of 10th WSEAS Conference on Mathematical Methods, Computational ISBN: 978-960-474-297-4 336
  • 6. Proceedings of the European Computing Conference Techniques and Intelligent Systems (MAMECTIS '08), Corfu 2008, WSEAS Press, pp. 289-295, 2008. [12] F. Stoica, E. M. Popa, I. Pah, A new reinforcement scheme for stochastic learning automata – Application to Automatic Control, Proceedings of the International Conference on e-Business, Porto, Portugal, pp. 45-50, 2008. [13] F. Stoica, D. Simian, Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm, Proceedings of the 11th WSEAS International Conference on Evolutionary Computing (EC'10), Iaşi, Romania, pp. 273-278, 2010. [14] R. Sutton, A. Barto, Reinforcement learning: An introduction, MIT-press, Cambridge, MA, 1998. [15] C. Ünsal, P. Kachroo, J. S. Bay, Simulation Study of Learning Automata Games in Automated Highway Systems, 1st IEEE Conference on Intelligent Transportation Systems (ITSC’97), Boston, Massachusetts, 1997 [16] C. Ünsal, P. Kachroo, J. S. Bay, Simulation Study of Multiple Intelligent Vehicle Control using Stochastic Learning Automata, IEEE Transactions on Systems, Man and Cybernetics – Part A, Systems and Human, pp.1-42, 1997. [17] C. Ünsal, Intelligent Navigation of Autonomous Vehicles in an Automated Highway System: Learning Methods and Interacting Vehicles Approach, dissertation thesis, Pittsburg University, Virginia, USA, 1997. ISBN: 978-960-474-297-4 337