SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 
A new Evolutionary Reinforcement Scheme for Stochastic Learning 
Automata 
FLORIN STOICA, EMIL M. POPA 
Computer Science Department 
“Lucian Blaga” University Sibiu 
Str. Dr. Ion Ratiu 5-7, 550012, Sibiu 
ROMANIA 
Abstract: - A stochastic automaton can perform a finite number of actions in a random environment. When a specific 
action is performed, the environment responds by producing an environment output that is stochastically related to the 
action. The aim is to design an automaton, using an evolutionary reinforcement scheme (the basis of the learning 
process), that can determine the best action guided by past actions and responses. Using Stochastic Learning Automata 
techniques, we introduce a decision/control method for intelligent vehicles receiving data from on-board sensors or 
from the localization system of highway infrastructure. 
Key-Words: - Stochastic Learning Automata, Reinforcement Learning, Intelligent Vehicle Control 
1 Introduction 
An automaton is a machine or control mechanism 
designed to automatically follow a predetermined 
sequence of operations or respond to encoded 
instructions. The term stochastic emphasizes the 
adaptive nature of the automaton we describe here. The 
automaton described here does not follow predetermined 
rules, but adapts to changes in its environment. This 
adaptation is the result of the learning process. Learning 
is defined as any permanent change in behavior as a 
result of past experience, and a learning system should 
therefore have the ability to improve its behavior with 
time, toward a final goal. 
The stochastic automaton attempts a solution of the 
problem without any information on the optimal action 
(initially, equal probabilities are attached to all the 
actions). One action is selected at random, the response 
from the environment is observed, action probabilities 
are updated based on that response, and the procedure is 
repeated. A stochastic automaton acting as described to 
improve its performance is called a learning automaton. 
The response values can be represented in three 
different models. In the P-model, the response values are 
either 0 or 1, in the S-model the response values are 
continuous in the range (0, 1) and in the Q-model the 
values belong to a finite set of discrete values in the 
range (0, 1). 
The environment can further be split up in two types, 
stationary and nonstationary. In a stationary environment 
the penalty probabilities will never change. In a 
nonstationary environment the penalties will change 
over time. 
The algorithm that guarantees the desired learning 
process is called a reinforcement scheme [5]. The 
reinforcement scheme is the basis of the learning process 
for learning automata. The general solution for 
absolutely expedient schemes was found by 
Lakshmivarahan and Thathachar [8]. However, these 
reinforcement schemes are theoretically valid only in 
stationary environments. 
For a nonstationary automata environment resulting 
from a changing physical environment, we propose a 
new evolutionary reinforcement scheme, based on an 
original co-mutation operator, called LR-Mijn. 
2 Stochastic learning automata 
Mathematically, the environment of a stochastic 
automaton is defined by a triple {α , c,β } where 
α ={α1 ,α 2 ,...,α r } represents a finite set of actions 
being the input to the environment, { 1 , 2} β = β β 
represents a binary response set, and c ={c1 , c2 ,..., cr } is 
a set of penalty probabilities, where ci is the probability 
that action i α 
will result in an unfavourable response. 
Given that β (n) = 0 is a favourable outcome and 
β (n) =1 is an unfavourable outcome at time instant 
n (n = 0,1, 2, ...) , the element ci of c is defined 
mathematically by: 
ci = P(β (n) =1|α (n) =α i ) i =1, 2, ..., r 
A learning automaton generates a sequence of 
actions on the basis of its interaction with the 
environment. If the automaton is “learning” in the 
process, its performance must be superior to “intuitive” 
methods [6]. Consider a stationary random environment 
ISBN: 978-960-6766-85-5 268 ISSN: 1790-5109
12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 
with penalty probabilities {c1,c2 ,...,cr } defined above. 
In order to describe the reinforcement schemes, is 
defined p(n) , a vector of action probabilities: 
pi (n) = P(α (n) =α i ), i =1, r 
We define a quantity M(n) as the average penalty 
for a given action probability vector: 
M ( n ) = P ( β 
( n ) = 1| p ( n 
)) 
= 
Σ Σ 
= = 
( β ( ) = 1| α ( ) = α ) ∗ ( α ( ) = α 
) = 
( ) 
r 
i 
i i 
r 
i 
P n n i P n i c p n 
1 1 
An automaton is absolutely expedient if the expected 
value of the average penalty at one iteration step is less 
than it was at the previous step for all steps: 
M(n +1) < M(n) for all n [7]. 
Updating action probabilities can be represented as 
follows: 
p(n +1) = T[ p(n),α (n),β (n)] , 
where T is a mapping. This formula says the next action 
probability p(n +1) is updated based on the current 
probability p(n) , the input from the environment and the 
resulting action. If p(n +1) is a linear function of p(n) , 
the reinforcement scheme is said to be linear; otherwise 
it is termed nonlinear. 
Absolutely expedient learning schemes are presently 
the only class of schemes for which necessary and 
sufficient conditions of design are available. 
However, the need for learning and adaptation in 
systems is mainly due to the fact that the environment 
changes with time. The performance of a learning 
automaton should be judged in such a context. If a 
learning automaton with a fixed strategy is used in a 
nonstationary environment, it may become less 
expedient, and even non-expedient. The learning scheme 
must have sufficient flexibility to track the better actions. 
If the action probabilities of the learning automata are 
functions of the status of the physical environment (such 
in case of an Intelligent Vehicle Control System), the 
realization of a deterministic mathematical model of this 
physical environment may be impossible, if not 
extremely difficult. 
For a nonstationary automata environment resulting 
from a changing physical environment, we propose a 
new evolutionary reinforcement scheme, based on an 
original co-mutation operator, called LR-Mijn. 
3 The LR-Mijn operator 
In [10] was presented a co-mutation operator called Mijn, 
capable of operating on a set of adjacent bits in one 
single step. Its properties were examined and compared 
against those of the bit-flip mutation. A simple 
Evolutionary Algorithm, called MEA, based only on 
selection and Mijn, was experimentally used on some 
classical test functions widely known from literature, in 
order to compare its performance against that offered by 
a classical Genetic Algorithm (SGA). The obtained 
results proved the superiority of MEA algorithm when 
compared to the SGA [12]. 
The aim of this section is to introduce a new co-mutation 
operator called LR-Mijn, derived from Mijn. The 
LR-MEA algorithm, based on the LR-Mijn operator, was 
used for the optimization of some well-known 
benchmark functions and its performance was compared 
against that offered by algorithms using the Mijn 
operator. The obtained results proved the effectiveness 
of our LR-Mijn operator [14]. 
Let us consider a generic alphabet A = {a1, a2, … , as} 
composed by s ≥2 different symbols. The set of all 
sequences of length l over the alphabet A will be denoted 
with Σ = Al. In the following we shall denote with σ a 
generic string σ = σl-1…σ0 ∈ Σ, where σq ∈ A ∀ q ∈ {0, 
… , l-1}. 
The Mijn operator is capable of operating on a set of 
adjacent bits in one single step, starting from a randomly 
chosen position p and going left within the chromosome. 
[10]. 
In this section we introduce a new co-mutation 
operator called LR-Mijn. Our LR-Mijn operator finds the 
longest sequence of σp elements, situated in the left or in 
the right of the position p. If the longest sequence is in 
the left of p, the LR-Mijn behaves as Mijn, otherwise the 
LR-Mijn will operate on the set of bits starting from p and 
going to the right. 
Through σ(q,i) we denote that on position q within 
the sequence σ there is the symbol ai of the alphabet A, 
right , 
n 
and 
p i 
σ ( , ) specify the presence of symbol ai on 
left m 
, 
position p within the sequence σ, between right symbols 
an on the right and left symbols am on the left. We 
right , 
i 
suppose that σ = σ(l-1)... σ(p+left+1,m) 
p i 
σ ( , ) σ(p-right- 
left i 
, 
1,n)... σ(0). 
Definition 3.1 Formally, the LR-Mijn operator is defined 
as follows: 
(i) If p ≠ right and p ≠ l – left – 1, LR-Mijn(σ)= 
⎧ 
σ σ σ σ 
⎪ ⎪ ⎪ ⎪ ⎪ 
σ σ 
σ σ σ σ 
⎨ 
⎪ ⎪ ⎪ ⎪ ⎪ 
⎩ 
right , 
i 
l p left i p m 
( 1)... ( 1, ) ( , ) 
= − + + 
left m 
p right n for left right 
( 1, )... (0) 
, 
− − > 
, 
l p left m p n 
( 1)... ( 1, ) ( , ) 
= − + + 
p right i for left right 
( 1, )... (0) 
, 
− − < 
σ σ 
or for left = 
right with probability 
, 0.5 
left 
rightt 
right rleft 
right n 
left i 
σ σ 
ISBN: 978-960-6766-85-5 269 ISSN: 1790-5109
12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 
(ii) If p = right and p ≠ l – left – 1, 
right , 
i 
σ = σ(l-1)... σ(p+left+1,m) 
p i 
σ ( , ) and 
left , 
i 
LR-Mijn(σ) = σ(l-1)... σ(p+left+1,m) 
right , 
k 
p k 
σ ( , ) , 
left , 
i 
where k ≠ i (randomly chosen). 
(iii) If p ≠ right and p = l – left – 1, 
right , 
i 
σ = 
p i 
σ ( , ) σ(p-right-1,n)... σ(0) and 
left , 
i 
LR-Mijn(σ) = 
right , 
i 
p k 
σ ( , ) σ(p-right-1,n)... σ(0), where k ≠ i 
left , 
k 
(randomly chosen). 
(iv) If p = right and p = l – left –1, σ = 
right , 
i 
p i 
σ ( , ) 
left , 
i 
LR-Mijn(σ)= 
⎧ 
σ σ 
⎪ ⎪ ⎪ 
σ σ 
⎨ 
⎪ ⎪ ⎪ 
σ σ 
⎩ 
right i 
p k for left right where k i 
( , ) , , 
= > ≠ 
left k 
right k 
p k for left right where k i 
( , ) , , 
= < ≠ 
= 
, 0.5 
, 
left , 
i 
, 
, 
or for left right with probability 
left 
rightt 
right rleft 
As an example, let us consider the binary case, the 
string σ = 11110000 and the randomly chosen 
application point p = 2. In this case, σ2 = 0, so we have to 
find the longest sequence of 0 within string σ, starting 
from position p. This sequence goes to the right, and 
because we have reached the end of the string, and no 
occurrence of 1 has been met, the new string obtained 
after the application of LR-Mijn is 11110111. 
If we suppose that the randomly chosen point p = 6, 
we have in this case σ6 = 1. Again, the longest sequence 
of elements 1 within σ goes to the right, so we have to 
go right bound and look for the first occurrence of a 0 in 
σ. For p = 3 we have σ3 = 0 so we stop. Then we replace 
the string components from 6 to 3, and the new string 
obtained after the application of LR-Mijn is 10001000. 
In the following, we will consider that A is the binary 
alphabet, A = {0, 1}. We suppose that σ represent just 
one and only one integer or real variable. In the 
following we shall call value of string σ the integer 
number between 0 and 2l-1 corresponding to the string 
encode and jump length the difference between the 
values represented by LR-Mijn(σ)/Mijn(σ) and σ, 
respectively. In cases (ii), (iii) and (iv), the operator LR-Mijn 
will produce so-called long jumps. 
In the following we will present some properties of 
the LR-Mijn operator, in order to evaluate the number of 
long jumps performed by our new co-mutation operator. 
Property 3.1 [14] LR-Mijn allows performing a number 
of long jumps equal to: 
4 + − l 
γ’ = (2 3) 
3 
l c , 
where cl = 2 for l even and cl =1 for l odd. 
The proof of this property can be found in [14]. 
Property 3.2 [14] The relative number of long jumps 
caused by the LR-Mijn operator is equal to: 
ρ’ = 
number of long jumps = 
total number of possible jumps 
γ' 
l 2l 
4 
⋅ + − 
= l 
l 
l 
l 
c 
2 
(2 3) 
3 
⋅ 
= 
4 − 
= l l c 
( 3) 
4 
3 2 
3 
⋅ ⋅ 
+ 
⋅ 
l l 
, where cl is 
defined above. 
Property 3.3 [14] The probability of performing long 
jumps by the LR-Mijn operator tends to 
1 
3 
4 ⋅ . 
l 
The most interesting point in the co-mutation 
operator presented above is the fact that it allows long 
jumps, thus the search can reach very far points from 
where the search currently is. The LR-Mijn operator 
allows performing longer jumps which are not possible 
in the classical Bit-Flip Mutation (BFM) used in GAs. 
This means that our co-mutation operator has about the 
same capabilities of local search as ordinary mutation 
has, and it has also the possibility of performing jumps 
to reach far regions in the search space which cannot be 
reached by BFM. Properties 3.1-3.3 lay out that the LR-Mijn 
operator performs more long jumps than Mijn 
[12,14], which leads to a better convergence of an 
Evolutionary Algorithm based on the LR-Mijn in 
comparison with an algorithm based on the Mijn operator. 
This statement was be verified through experimental 
results in [14]. 
In the following section is presented an Evolutionary 
Reinforcement Scheme which is based only on selection 
and LR-Mijn 
4 A new evolutionary reinforcement 
scheme 
We denote with K the dimension of the population. 
The proposed learning algorithm, which ensures the 
update of the probability vector of the stochastic 
automaton, is described in the following: 
ISBN: 978-960-6766-85-5 270 ISSN: 1790-5109
12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 
p[] function Learning-LR-Mijn(action) 
begin 
t = 0; 
- initialize randomly population P(t) with K elements; 
- evaluate P (t) by using fitness function; 
while not Terminated 
for j = 1 to K-1 do 
- select randomly one element among the best T% 
from P(t); 
- mutate it using LR-Mijn, performing a mutation 
within the gene which contains the probability of 
the action selected by the automaton; 
- insert the new chromosome into the population 
P’(t); 
end for 
- choose the best element from P(t) and insert it into 
P’(t); 
- evaluate P’(t) using the fitness function; 
P(t+1) = P’(t); 
t = t + 1; 
end while 
- select the best chromosome from population P(t); 
- decode from that chromosome the real values 
representing the probabilities associated with the 
actions of the automaton; 
- return the obtained probabilities vector p[]; 
end 
In the following, we denote with: 
• N - the number of actions of the 
corresponding automaton; 
• k 
pn - the probability of action n , 
stored within the chromosome k ; 
• β n - the environment response for 
action n . 
We define the fitness function as follows: 
N 
k 
n p k f β − =Σ= 
f :{1,..,K}→ℜ, ( ) ( 1 
) n 
n 
1 
N 
∈ ∀ = Σ= 
The relations p k { K} 
n 
k 
n 1, 1,.., 
1 
must be verified 
after application of the LR-Mijn on any chromosome k 
from population. In order to accomplish this, we proceed 
as follows: for the variable of index i (preferably 
unaltered by the LR-Mijn), is calculated the 
sum Σ 
≠ = 
= 
N 
i n n 
k 
n 
S k p 
1 
. The probability vector is then modified 
as follows: 
If S k >1 then 
k 
n * , 1,..,  := 1 ∈ 
k = 0 
pi 
p n { N} {i} 
p k 
S 
k n 
else 
k k 
pi =1− S . 
5 Using stochastic learning automata for 
Intelligent Vehicle Control 
In this section, we present a method for intelligent 
vehicle control, having as theoretical background 
Stochastic Learning Automata. The aim here is to design 
an automata system that can learn the best possible 
action based on the data received from on-board sensors, 
of from roadside-to-vehicle communications. For our 
model, we assume that an intelligent vehicle is capable 
of two sets of lateral and longitudinal actions. Lateral 
actions are LEFT (shift to left lane), RIGHT (shift to 
right lane) and LINE_OK (stay in current lane). 
Longitudinal actions are ACC (accelerate), DEC 
(decelerate) and SPEED_OK (keep current speed). An 
autonomous vehicle must be able to “sense” the 
environment around itself. Therefore, we assume that 
there are four different sensors modules on board the 
vehicle (the headway module, two side modules and a 
speed module), in order to detect the presence of a 
vehicle traveling in front of the vehicle or in the 
immediately adjacent lane and to know the current speed 
of the vehicle. These sensor modules evaluate the 
information received from the on-board sensors or from 
the highway infrastructure in the light of the current 
automata actions, and send a response to the automata. 
The response from physical environment is a 
combination of outputs from the sensor modules. 
Because an input parameter for the decision blocks is the 
action chosen by the stochastic automaton, is necessary 
to use two distinct functions for mapping the outputs of 
decision blocks in inputs for the two learning automata, 
namely the longitudinal automaton and respectively the 
lateral automaton. 
After updating the action probability vectors in both 
learning automata, using the evolutionary reinforcement 
scheme presented in section 4, the outputs from 
stochastic automata are transmitted to the regulation 
layer. The regulation layer handles the actions received 
from the two automata in a distinct manner, using for 
each of them a regulation buffer. If an action received 
was rewarded, it will be introduced in the regulation 
buffer of the corresponding automaton, else in buffer 
will be introduced a certain value which denotes a 
penalized action by the physical environment. The 
regulation layer does not carry out the action chosen 
immediately; instead, it carries out an action only if it is 
recommended k times consecutively by the automaton, 
where k is the length of the regulation buffer. After an 
action is executed, the action probability vector is 
ISBN: 978-960-6766-85-5 271 ISSN: 1790-5109
initialized to 
12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 
1 
, where r is the number of actions. When 
r 
an action is executed, regulation buffer is initialized also. 
6 Sensor modules 
The four teacher modules mentioned above are decision 
blocks that calculate the response (reward/penalty), 
based on the last chosen action of automaton. Table 1 
describes the output of decision blocks for side sensors. 
As seen in table 1, a penalty response is received 
from the left sensor module when the action is LEFT and 
there is a vehicle in the left or the vehicle is already 
traveling on the leftmost lane. There is a similar situation 
for the right sensor module. 
Left/Right Sensor Module 
Actions 
Vehicle in 
sensor range or 
no adjacent lane 
No vehicle in 
sensor range and 
adjacent lane 
exists 
LINE_OK 0/0 0/0 
LEFT 1/0 0/0 
RIGHT 0/1 0/0 
Table 1 Outputs from the Left/Right Sensor Module 
The Headway (Frontal) Module is defined as shown 
in table 2. If there is a vehicle at a close distance (< 
admissible distance), a penalty response is sent to the 
automaton for actions LINE_OK, SPEED_OK and 
ACC. All other actions (LEFT, RIGHT, DEC) are 
encouraged, because they may serve to avoid a collision. 
Headway Sensor Module 
Actions 
Vehicle in range 
(at a close 
frontal distance) 
No vehicle in 
range 
LINE_OK 1 0 
LEFT 0 0 
RIGHT 0 0 
SPEED_OK 1 0 
ACC 1 0 
DEC 0* 0 
Table 2 Outputs from the Headway Module 
The Speed Module compares the actual speed with 
the desired speed, and based on the action chosen send a 
feedback to the longitudinal automaton. 
The reward response indicated by 0* (from the 
Headway Sensor Module) is different than the normal 
reward response, indicated by 0: this reward response 
has a higher priority and must override a possible 
penalty from other modules. 
Speed Sensor Module 
Actions Speed: 
too slow 
Acceptable 
speed 
Speed: 
too fast 
SPEED_OK 1 0 1 
ACC 0 0 1 
DEC 1 0 0 
Table 3 Outputs from the Speed Module 
7 Implementation of a simulator 
In this section is described an implementation of a 
simulator for the Intelligent Vehicle Control System. 
The entire system was implemented in Java, and is based 
on JADE platform. 
JADE is a middleware that facilitates the 
development of multi-agent systems and applications 
conforming to FIPA standards for intelligent agents. In 
figure 1 is showed the class diagram of the simulator. 
Each vehicle has associated an agent, responsible for the 
intelligent control. 
The response of the physical environment is a 
combination of the outputs of all four sensor modules. 
The implementation of this combination for each 
automaton (longitudinal respectively lateral) is showed 
in figure 2 (the value 0* was substituted by 2). 
A snapshot of the running simulator is shown in figure 3. 
Fig. 1 The class diagram of the simulator 
// environment response for Longitudinal Automaton 
public double reward(int action){ 
int combine; 
combine = Math.max(speedModule(action), 
frontModule(action)); 
ISBN: 978-960-6766-85-5 272 ISSN: 1790-5109
12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 
if (combine = = 2) combine = 0; 
return combine; 
} 
// environment response for Lateral Automaton 
public double reward(int action){ 
int combine; 
combine = Math.max(leftRightModule(action), 
frontModule(action)); 
return combine; 
} 
Fig. 2 The physical environment response 
Fig. 3 A scenario executed in the simulator 
8 Conclusion 
Reinforcement learning has attracted rapidly increasing 
interest in the machine learning and artificial intelligence 
communities. Its promise is beguiling - a way of 
programming agents by reward and punishment without 
needing to specify how the task (i.e., behavior) is to be 
achieved. Reinforcement learning allows, at least in 
principle, to bypass the problems of building an explicit 
model of the behavior to be synthesized and its 
counterpart, a meaningful learning base (supervised 
learning). 
The evolutionary reinforcement scheme presented in this 
paper is suitable for nonstationary stochastic automata 
environment, resulting from a changing physical 
environment. Used within a simulator of an Intelligent 
Vehicle Control System, this new reinforcement scheme 
has proved its efficiency. 
References: 
[1] A. Barto, S. Mahadevan, Recent advances in 
hierarchical reinforcement learning, Discrete-Event 
Systems journal, Special issue on Reinforcement 
Learning, 2003. 
[2] R. Sutton, A. Barto, Reinforcement learning: An 
introduction, MIT-press, Cambridge, MA, 1998. 
[3] O. Buffet, A. Dutech, F. Charpillet, Incremental 
reinforcement learning for designing multi-agent 
systems, In J. P. Müller, E. Andre, S. Sen, and C. 
Frasson, editors, Proceedings of the Fifth 
International Conference onAutonomous Agents, pp. 
31–32,Montreal, Canada, 2001. ACM Press. 
[4] J. Moody, Y. Liu, M. Saffell, K. Youn, Stochastic 
direct reinforcement: Application to simple games 
with recurrence, In Proceedings of Artificial 
Multiagent Learning. Papers from the 2004 AAAI 
Fall Symposium,Technical Report FS-04-02, 2004. 
[5] C. Ünsal, P. Kachroo, J. S. Bay, Simulation Study of 
Multiple Intelligent Vehicle Control using Stochastic 
Learning Automata, TRANSACTIONS, the Quarterly 
Archival Journal of the Society for Computer 
Simulation International, volume 14, number 4, 
December 1997. 
[6] C. Ünsal, P. Kachroo, J. S. Bay, Multiple Stochastic 
Learning Automata for Vehicle Path Control in an 
Automated Highway System, IEEE Transactions on 
Systems, Man, and Cybernetics -part A: systems and 
humans, vol. 29, no. 1, january 1999 
[7] K. S. Narendra, M. A. L. Thathachar, Learning 
Automata: an introduction, Prentice-Hall, 1989. 
[8] S. Lakshmivarahan, M.A.L. Thathachar, Absolutely 
Expedient Learning Algorithms for Stochastic 
Automata, IEEE Transactions on Systems, Man and 
Cybernetics, vol. SMC-6, pp. 281-286, 1973 
[9] F. Stoica, E. M. Popa, An Absolutely Expedient 
Learning Algorithm for Stochastic Automata, 
WSEAS Transactions on Computers, Issue 2, Volume 
6, February 2007, ISSN 1109-2750, pp. 229-235 
[10] I. De Falco, A. Iazzetta, A. Della Cioppa, E. 
Tarantino, Mijn Mutation Operator for Aerofoil 
Design Optimisation, Research Institute on Parallel 
Information Systems, National Research Council of 
Italy, 2001 
[11] I. De Falco, R. Del Balio, A. Della Cioppa, E. 
Tarantino, A Comparative Analysis of Evolutionary 
Algorithms for Function Optimisation, Research 
Institute on Parallel Information Systems, National 
Research Council of Italy, 1998 
[12] I. De Falco, A. Iazzetta, A. Della Cioppa, E. 
Tarantino, The Effectiveness of Co-mutation in 
Evolutionary Algorithms: the Mijn operator, Research 
Institute on Parallel Information Systems, National 
Research Council of Italy, 2000 
[13] H. Kita, A comparison study of self-adaptation in 
evolution strategies and real-code genetic algorithms, 
Evolutionary Computation, 9(2):223-241, 2001 
[14] F. Stoica, D. Simian, C. Simian, A new co-mutation 
genetic operator, In Proceedings of Applications of 
Evolutionary Computing in Modeling and 
Development of Intelligent Systems, within the 9th 
WSEAS International Conference on 
EVOLUTIONARY COMPUTING (EC '08), May 
2008, Sofia, Bulgaria. 
ISBN: 978-960-6766-85-5 273 ISSN: 1790-5109

Weitere ähnliche Inhalte

Was ist angesagt?

Firefly Algorithm: Recent Advances and Applications
Firefly Algorithm: Recent Advances and ApplicationsFirefly Algorithm: Recent Advances and Applications
Firefly Algorithm: Recent Advances and ApplicationsXin-She Yang
 
An executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control SystemAn executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control Systeminfopapers
 
Review of Metaheuristics and Generalized Evolutionary Walk Algorithm
Review of Metaheuristics and Generalized Evolutionary Walk AlgorithmReview of Metaheuristics and Generalized Evolutionary Walk Algorithm
Review of Metaheuristics and Generalized Evolutionary Walk AlgorithmXin-She Yang
 
A Framework for Self-Tuning Optimization Algorithm
A Framework for Self-Tuning Optimization AlgorithmA Framework for Self-Tuning Optimization Algorithm
A Framework for Self-Tuning Optimization AlgorithmXin-She Yang
 
Firefly Algorithm, Stochastic Test Functions and Design Optimisation
 Firefly Algorithm, Stochastic Test Functions and Design Optimisation Firefly Algorithm, Stochastic Test Functions and Design Optimisation
Firefly Algorithm, Stochastic Test Functions and Design OptimisationXin-She Yang
 
Cuckoo Search & Firefly Algorithms
Cuckoo Search & Firefly AlgorithmsCuckoo Search & Firefly Algorithms
Cuckoo Search & Firefly AlgorithmsMustafa Salam
 
Travelling Salesman Problem
Travelling Salesman ProblemTravelling Salesman Problem
Travelling Salesman ProblemShikha Gupta
 
Metaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisMetaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisXin-She Yang
 
Helgaun's algorithm for the TSP
Helgaun's algorithm for the TSPHelgaun's algorithm for the TSP
Helgaun's algorithm for the TSPKaal Nath
 
Bat algorithm for Topology Optimization in Microelectronic Applications
Bat algorithm for Topology Optimization in Microelectronic ApplicationsBat algorithm for Topology Optimization in Microelectronic Applications
Bat algorithm for Topology Optimization in Microelectronic ApplicationsXin-She Yang
 
A Firefly Algorithm for Optimizing Spur Gear Parameters Under Non-Lubricated ...
A Firefly Algorithm for Optimizing Spur Gear Parameters Under Non-Lubricated ...A Firefly Algorithm for Optimizing Spur Gear Parameters Under Non-Lubricated ...
A Firefly Algorithm for Optimizing Spur Gear Parameters Under Non-Lubricated ...irjes
 
Agent based optimization
Agent based optimizationAgent based optimization
Agent based optimizationSpringer
 
Analysis of Nature-Inspried Optimization Algorithms
Analysis of Nature-Inspried Optimization AlgorithmsAnalysis of Nature-Inspried Optimization Algorithms
Analysis of Nature-Inspried Optimization AlgorithmsXin-She Yang
 
Firefly algorithm
Firefly algorithmFirefly algorithm
Firefly algorithmHasan Gök
 
Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2Arthur Charpentier
 
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...Zac Darcy
 

Was ist angesagt? (20)

Firefly Algorithm: Recent Advances and Applications
Firefly Algorithm: Recent Advances and ApplicationsFirefly Algorithm: Recent Advances and Applications
Firefly Algorithm: Recent Advances and Applications
 
An executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control SystemAn executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control System
 
Review of Metaheuristics and Generalized Evolutionary Walk Algorithm
Review of Metaheuristics and Generalized Evolutionary Walk AlgorithmReview of Metaheuristics and Generalized Evolutionary Walk Algorithm
Review of Metaheuristics and Generalized Evolutionary Walk Algorithm
 
A Framework for Self-Tuning Optimization Algorithm
A Framework for Self-Tuning Optimization AlgorithmA Framework for Self-Tuning Optimization Algorithm
A Framework for Self-Tuning Optimization Algorithm
 
Firefly Algorithm, Stochastic Test Functions and Design Optimisation
 Firefly Algorithm, Stochastic Test Functions and Design Optimisation Firefly Algorithm, Stochastic Test Functions and Design Optimisation
Firefly Algorithm, Stochastic Test Functions and Design Optimisation
 
Cuckoo Search & Firefly Algorithms
Cuckoo Search & Firefly AlgorithmsCuckoo Search & Firefly Algorithms
Cuckoo Search & Firefly Algorithms
 
Travelling Salesman Problem
Travelling Salesman ProblemTravelling Salesman Problem
Travelling Salesman Problem
 
Metaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisMetaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical Analysis
 
Helgaun's algorithm for the TSP
Helgaun's algorithm for the TSPHelgaun's algorithm for the TSP
Helgaun's algorithm for the TSP
 
Bat algorithm for Topology Optimization in Microelectronic Applications
Bat algorithm for Topology Optimization in Microelectronic ApplicationsBat algorithm for Topology Optimization in Microelectronic Applications
Bat algorithm for Topology Optimization in Microelectronic Applications
 
A Firefly Algorithm for Optimizing Spur Gear Parameters Under Non-Lubricated ...
A Firefly Algorithm for Optimizing Spur Gear Parameters Under Non-Lubricated ...A Firefly Algorithm for Optimizing Spur Gear Parameters Under Non-Lubricated ...
A Firefly Algorithm for Optimizing Spur Gear Parameters Under Non-Lubricated ...
 
Agent based optimization
Agent based optimizationAgent based optimization
Agent based optimization
 
Analysis of Nature-Inspried Optimization Algorithms
Analysis of Nature-Inspried Optimization AlgorithmsAnalysis of Nature-Inspried Optimization Algorithms
Analysis of Nature-Inspried Optimization Algorithms
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
 
Firefly algorithm
Firefly algorithmFirefly algorithm
Firefly algorithm
 
Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Econ. Seminar Uqam
Econ. Seminar UqamEcon. Seminar Uqam
Econ. Seminar Uqam
 
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
 

Andere mochten auch

Using the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis ModelUsing the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis Modelinfopapers
 
An Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine LanguageAn Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine Languageinfopapers
 
Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...infopapers
 
Building a new CTL model checker using Web Services
Building a new CTL model checker using Web ServicesBuilding a new CTL model checker using Web Services
Building a new CTL model checker using Web Servicesinfopapers
 
A Distributed CTL Model Checker
A Distributed CTL Model CheckerA Distributed CTL Model Checker
A Distributed CTL Model Checkerinfopapers
 
A new co-mutation genetic operator
A new co-mutation genetic operatorA new co-mutation genetic operator
A new co-mutation genetic operatorinfopapers
 
Deliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE ApplicationsDeliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE Applicationsinfopapers
 
Implementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra conceptsImplementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra conceptsinfopapers
 
Modeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI AgentModeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI Agentinfopapers
 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsinfopapers
 
Algebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model CheckerAlgebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model Checkerinfopapers
 
Generic Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their OptimizationGeneric Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their Optimizationinfopapers
 
Intelligent agents in ontology-based applications
Intelligent agents in ontology-based applicationsIntelligent agents in ontology-based applications
Intelligent agents in ontology-based applicationsinfopapers
 
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...infopapers
 
Building a Web-bridge for JADE agents
Building a Web-bridge for JADE agentsBuilding a Web-bridge for JADE agents
Building a Web-bridge for JADE agentsinfopapers
 
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithmOptimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithminfopapers
 

Andere mochten auch (16)

Using the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis ModelUsing the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis Model
 
An Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine LanguageAn Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine Language
 
Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...
 
Building a new CTL model checker using Web Services
Building a new CTL model checker using Web ServicesBuilding a new CTL model checker using Web Services
Building a new CTL model checker using Web Services
 
A Distributed CTL Model Checker
A Distributed CTL Model CheckerA Distributed CTL Model Checker
A Distributed CTL Model Checker
 
A new co-mutation genetic operator
A new co-mutation genetic operatorA new co-mutation genetic operator
A new co-mutation genetic operator
 
Deliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE ApplicationsDeliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE Applications
 
Implementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra conceptsImplementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra concepts
 
Modeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI AgentModeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI Agent
 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernels
 
Algebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model CheckerAlgebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model Checker
 
Generic Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their OptimizationGeneric Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their Optimization
 
Intelligent agents in ontology-based applications
Intelligent agents in ontology-based applicationsIntelligent agents in ontology-based applications
Intelligent agents in ontology-based applications
 
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
 
Building a Web-bridge for JADE agents
Building a Web-bridge for JADE agentsBuilding a Web-bridge for JADE agents
Building a Web-bridge for JADE agents
 
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithmOptimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm
 

Ähnlich wie A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata

Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...infopapers
 
A New Nonlinear Reinforcement Scheme for Stochastic Learning Automata
A New Nonlinear Reinforcement Scheme for Stochastic Learning AutomataA New Nonlinear Reinforcement Scheme for Stochastic Learning Automata
A New Nonlinear Reinforcement Scheme for Stochastic Learning Automatainfopapers
 
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...IJCSEA Journal
 
An evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernelsAn evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernelsinfopapers
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...TELKOMNIKA JOURNAL
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...ijscai
 
Adaptive filtersfinal
Adaptive filtersfinalAdaptive filtersfinal
Adaptive filtersfinalWiw Miu
 
PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfssusera1eccd
 
Using Learning Automata in Coordination Among Heterogeneous Agents in a Compl...
Using Learning Automata in Coordination Among Heterogeneous Agents in a Compl...Using Learning Automata in Coordination Among Heterogeneous Agents in a Compl...
Using Learning Automata in Coordination Among Heterogeneous Agents in a Compl...Waqas Tariq
 
A Framework for Self-Tuning Optimization Algorithm
A Framework for Self-Tuning Optimization AlgorithmA Framework for Self-Tuning Optimization Algorithm
A Framework for Self-Tuning Optimization AlgorithmXin-She Yang
 
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...cscpconf
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...University Utara Malaysia
 
COMPARING THE CUCKOO ALGORITHM WITH OTHER ALGORITHMS FOR ESTIMATING TWO GLSD ...
COMPARING THE CUCKOO ALGORITHM WITH OTHER ALGORITHMS FOR ESTIMATING TWO GLSD ...COMPARING THE CUCKOO ALGORITHM WITH OTHER ALGORITHMS FOR ESTIMATING TWO GLSD ...
COMPARING THE CUCKOO ALGORITHM WITH OTHER ALGORITHMS FOR ESTIMATING TWO GLSD ...csandit
 
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...ijccmsjournal
 
Practical Reinforcement Learning with TensorFlow
Practical Reinforcement Learning with TensorFlowPractical Reinforcement Learning with TensorFlow
Practical Reinforcement Learning with TensorFlowIllia Polosukhin
 

Ähnlich wie A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata (20)

Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
 
A New Nonlinear Reinforcement Scheme for Stochastic Learning Automata
A New Nonlinear Reinforcement Scheme for Stochastic Learning AutomataA New Nonlinear Reinforcement Scheme for Stochastic Learning Automata
A New Nonlinear Reinforcement Scheme for Stochastic Learning Automata
 
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
 
An evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernelsAn evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernels
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
 
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
 
Adaptive filtersfinal
Adaptive filtersfinalAdaptive filtersfinal
Adaptive filtersfinal
 
PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdf
 
Using Learning Automata in Coordination Among Heterogeneous Agents in a Compl...
Using Learning Automata in Coordination Among Heterogeneous Agents in a Compl...Using Learning Automata in Coordination Among Heterogeneous Agents in a Compl...
Using Learning Automata in Coordination Among Heterogeneous Agents in a Compl...
 
Thesis
ThesisThesis
Thesis
 
A Framework for Self-Tuning Optimization Algorithm
A Framework for Self-Tuning Optimization AlgorithmA Framework for Self-Tuning Optimization Algorithm
A Framework for Self-Tuning Optimization Algorithm
 
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
 
COMPARING THE CUCKOO ALGORITHM WITH OTHER ALGORITHMS FOR ESTIMATING TWO GLSD ...
COMPARING THE CUCKOO ALGORITHM WITH OTHER ALGORITHMS FOR ESTIMATING TWO GLSD ...COMPARING THE CUCKOO ALGORITHM WITH OTHER ALGORITHMS FOR ESTIMATING TWO GLSD ...
COMPARING THE CUCKOO ALGORITHM WITH OTHER ALGORITHMS FOR ESTIMATING TWO GLSD ...
 
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
 
Practical Reinforcement Learning with TensorFlow
Practical Reinforcement Learning with TensorFlowPractical Reinforcement Learning with TensorFlow
Practical Reinforcement Learning with TensorFlow
 

Mehr von infopapers

A New Model Checking Tool
A New Model Checking ToolA New Model Checking Tool
A New Model Checking Toolinfopapers
 
CTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR ToolsCTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR Toolsinfopapers
 
Generating JADE agents from SDL specifications
Generating JADE agents from SDL specificationsGenerating JADE agents from SDL specifications
Generating JADE agents from SDL specificationsinfopapers
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
 
Interoperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web ServicesInteroperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web Servicesinfopapers
 
Using Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning SystemsUsing Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning Systemsinfopapers
 
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...infopapers
 

Mehr von infopapers (7)

A New Model Checking Tool
A New Model Checking ToolA New Model Checking Tool
A New Model Checking Tool
 
CTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR ToolsCTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR Tools
 
Generating JADE agents from SDL specifications
Generating JADE agents from SDL specificationsGenerating JADE agents from SDL specifications
Generating JADE agents from SDL specifications
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
 
Interoperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web ServicesInteroperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web Services
 
Using Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning SystemsUsing Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning Systems
 
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
 

Kürzlich hochgeladen

Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and momentdonamiaquintan2
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 

Kürzlich hochgeladen (20)

Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and moment
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 

A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata

  • 1. 12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata FLORIN STOICA, EMIL M. POPA Computer Science Department “Lucian Blaga” University Sibiu Str. Dr. Ion Ratiu 5-7, 550012, Sibiu ROMANIA Abstract: - A stochastic automaton can perform a finite number of actions in a random environment. When a specific action is performed, the environment responds by producing an environment output that is stochastically related to the action. The aim is to design an automaton, using an evolutionary reinforcement scheme (the basis of the learning process), that can determine the best action guided by past actions and responses. Using Stochastic Learning Automata techniques, we introduce a decision/control method for intelligent vehicles receiving data from on-board sensors or from the localization system of highway infrastructure. Key-Words: - Stochastic Learning Automata, Reinforcement Learning, Intelligent Vehicle Control 1 Introduction An automaton is a machine or control mechanism designed to automatically follow a predetermined sequence of operations or respond to encoded instructions. The term stochastic emphasizes the adaptive nature of the automaton we describe here. The automaton described here does not follow predetermined rules, but adapts to changes in its environment. This adaptation is the result of the learning process. Learning is defined as any permanent change in behavior as a result of past experience, and a learning system should therefore have the ability to improve its behavior with time, toward a final goal. The stochastic automaton attempts a solution of the problem without any information on the optimal action (initially, equal probabilities are attached to all the actions). One action is selected at random, the response from the environment is observed, action probabilities are updated based on that response, and the procedure is repeated. A stochastic automaton acting as described to improve its performance is called a learning automaton. The response values can be represented in three different models. In the P-model, the response values are either 0 or 1, in the S-model the response values are continuous in the range (0, 1) and in the Q-model the values belong to a finite set of discrete values in the range (0, 1). The environment can further be split up in two types, stationary and nonstationary. In a stationary environment the penalty probabilities will never change. In a nonstationary environment the penalties will change over time. The algorithm that guarantees the desired learning process is called a reinforcement scheme [5]. The reinforcement scheme is the basis of the learning process for learning automata. The general solution for absolutely expedient schemes was found by Lakshmivarahan and Thathachar [8]. However, these reinforcement schemes are theoretically valid only in stationary environments. For a nonstationary automata environment resulting from a changing physical environment, we propose a new evolutionary reinforcement scheme, based on an original co-mutation operator, called LR-Mijn. 2 Stochastic learning automata Mathematically, the environment of a stochastic automaton is defined by a triple {α , c,β } where α ={α1 ,α 2 ,...,α r } represents a finite set of actions being the input to the environment, { 1 , 2} β = β β represents a binary response set, and c ={c1 , c2 ,..., cr } is a set of penalty probabilities, where ci is the probability that action i α will result in an unfavourable response. Given that β (n) = 0 is a favourable outcome and β (n) =1 is an unfavourable outcome at time instant n (n = 0,1, 2, ...) , the element ci of c is defined mathematically by: ci = P(β (n) =1|α (n) =α i ) i =1, 2, ..., r A learning automaton generates a sequence of actions on the basis of its interaction with the environment. If the automaton is “learning” in the process, its performance must be superior to “intuitive” methods [6]. Consider a stationary random environment ISBN: 978-960-6766-85-5 268 ISSN: 1790-5109
  • 2. 12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 with penalty probabilities {c1,c2 ,...,cr } defined above. In order to describe the reinforcement schemes, is defined p(n) , a vector of action probabilities: pi (n) = P(α (n) =α i ), i =1, r We define a quantity M(n) as the average penalty for a given action probability vector: M ( n ) = P ( β ( n ) = 1| p ( n )) = Σ Σ = = ( β ( ) = 1| α ( ) = α ) ∗ ( α ( ) = α ) = ( ) r i i i r i P n n i P n i c p n 1 1 An automaton is absolutely expedient if the expected value of the average penalty at one iteration step is less than it was at the previous step for all steps: M(n +1) < M(n) for all n [7]. Updating action probabilities can be represented as follows: p(n +1) = T[ p(n),α (n),β (n)] , where T is a mapping. This formula says the next action probability p(n +1) is updated based on the current probability p(n) , the input from the environment and the resulting action. If p(n +1) is a linear function of p(n) , the reinforcement scheme is said to be linear; otherwise it is termed nonlinear. Absolutely expedient learning schemes are presently the only class of schemes for which necessary and sufficient conditions of design are available. However, the need for learning and adaptation in systems is mainly due to the fact that the environment changes with time. The performance of a learning automaton should be judged in such a context. If a learning automaton with a fixed strategy is used in a nonstationary environment, it may become less expedient, and even non-expedient. The learning scheme must have sufficient flexibility to track the better actions. If the action probabilities of the learning automata are functions of the status of the physical environment (such in case of an Intelligent Vehicle Control System), the realization of a deterministic mathematical model of this physical environment may be impossible, if not extremely difficult. For a nonstationary automata environment resulting from a changing physical environment, we propose a new evolutionary reinforcement scheme, based on an original co-mutation operator, called LR-Mijn. 3 The LR-Mijn operator In [10] was presented a co-mutation operator called Mijn, capable of operating on a set of adjacent bits in one single step. Its properties were examined and compared against those of the bit-flip mutation. A simple Evolutionary Algorithm, called MEA, based only on selection and Mijn, was experimentally used on some classical test functions widely known from literature, in order to compare its performance against that offered by a classical Genetic Algorithm (SGA). The obtained results proved the superiority of MEA algorithm when compared to the SGA [12]. The aim of this section is to introduce a new co-mutation operator called LR-Mijn, derived from Mijn. The LR-MEA algorithm, based on the LR-Mijn operator, was used for the optimization of some well-known benchmark functions and its performance was compared against that offered by algorithms using the Mijn operator. The obtained results proved the effectiveness of our LR-Mijn operator [14]. Let us consider a generic alphabet A = {a1, a2, … , as} composed by s ≥2 different symbols. The set of all sequences of length l over the alphabet A will be denoted with Σ = Al. In the following we shall denote with σ a generic string σ = σl-1…σ0 ∈ Σ, where σq ∈ A ∀ q ∈ {0, … , l-1}. The Mijn operator is capable of operating on a set of adjacent bits in one single step, starting from a randomly chosen position p and going left within the chromosome. [10]. In this section we introduce a new co-mutation operator called LR-Mijn. Our LR-Mijn operator finds the longest sequence of σp elements, situated in the left or in the right of the position p. If the longest sequence is in the left of p, the LR-Mijn behaves as Mijn, otherwise the LR-Mijn will operate on the set of bits starting from p and going to the right. Through σ(q,i) we denote that on position q within the sequence σ there is the symbol ai of the alphabet A, right , n and p i σ ( , ) specify the presence of symbol ai on left m , position p within the sequence σ, between right symbols an on the right and left symbols am on the left. We right , i suppose that σ = σ(l-1)... σ(p+left+1,m) p i σ ( , ) σ(p-right- left i , 1,n)... σ(0). Definition 3.1 Formally, the LR-Mijn operator is defined as follows: (i) If p ≠ right and p ≠ l – left – 1, LR-Mijn(σ)= ⎧ σ σ σ σ ⎪ ⎪ ⎪ ⎪ ⎪ σ σ σ σ σ σ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ right , i l p left i p m ( 1)... ( 1, ) ( , ) = − + + left m p right n for left right ( 1, )... (0) , − − > , l p left m p n ( 1)... ( 1, ) ( , ) = − + + p right i for left right ( 1, )... (0) , − − < σ σ or for left = right with probability , 0.5 left rightt right rleft right n left i σ σ ISBN: 978-960-6766-85-5 269 ISSN: 1790-5109
  • 3. 12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 (ii) If p = right and p ≠ l – left – 1, right , i σ = σ(l-1)... σ(p+left+1,m) p i σ ( , ) and left , i LR-Mijn(σ) = σ(l-1)... σ(p+left+1,m) right , k p k σ ( , ) , left , i where k ≠ i (randomly chosen). (iii) If p ≠ right and p = l – left – 1, right , i σ = p i σ ( , ) σ(p-right-1,n)... σ(0) and left , i LR-Mijn(σ) = right , i p k σ ( , ) σ(p-right-1,n)... σ(0), where k ≠ i left , k (randomly chosen). (iv) If p = right and p = l – left –1, σ = right , i p i σ ( , ) left , i LR-Mijn(σ)= ⎧ σ σ ⎪ ⎪ ⎪ σ σ ⎨ ⎪ ⎪ ⎪ σ σ ⎩ right i p k for left right where k i ( , ) , , = > ≠ left k right k p k for left right where k i ( , ) , , = < ≠ = , 0.5 , left , i , , or for left right with probability left rightt right rleft As an example, let us consider the binary case, the string σ = 11110000 and the randomly chosen application point p = 2. In this case, σ2 = 0, so we have to find the longest sequence of 0 within string σ, starting from position p. This sequence goes to the right, and because we have reached the end of the string, and no occurrence of 1 has been met, the new string obtained after the application of LR-Mijn is 11110111. If we suppose that the randomly chosen point p = 6, we have in this case σ6 = 1. Again, the longest sequence of elements 1 within σ goes to the right, so we have to go right bound and look for the first occurrence of a 0 in σ. For p = 3 we have σ3 = 0 so we stop. Then we replace the string components from 6 to 3, and the new string obtained after the application of LR-Mijn is 10001000. In the following, we will consider that A is the binary alphabet, A = {0, 1}. We suppose that σ represent just one and only one integer or real variable. In the following we shall call value of string σ the integer number between 0 and 2l-1 corresponding to the string encode and jump length the difference between the values represented by LR-Mijn(σ)/Mijn(σ) and σ, respectively. In cases (ii), (iii) and (iv), the operator LR-Mijn will produce so-called long jumps. In the following we will present some properties of the LR-Mijn operator, in order to evaluate the number of long jumps performed by our new co-mutation operator. Property 3.1 [14] LR-Mijn allows performing a number of long jumps equal to: 4 + − l γ’ = (2 3) 3 l c , where cl = 2 for l even and cl =1 for l odd. The proof of this property can be found in [14]. Property 3.2 [14] The relative number of long jumps caused by the LR-Mijn operator is equal to: ρ’ = number of long jumps = total number of possible jumps γ' l 2l 4 ⋅ + − = l l l l c 2 (2 3) 3 ⋅ = 4 − = l l c ( 3) 4 3 2 3 ⋅ ⋅ + ⋅ l l , where cl is defined above. Property 3.3 [14] The probability of performing long jumps by the LR-Mijn operator tends to 1 3 4 ⋅ . l The most interesting point in the co-mutation operator presented above is the fact that it allows long jumps, thus the search can reach very far points from where the search currently is. The LR-Mijn operator allows performing longer jumps which are not possible in the classical Bit-Flip Mutation (BFM) used in GAs. This means that our co-mutation operator has about the same capabilities of local search as ordinary mutation has, and it has also the possibility of performing jumps to reach far regions in the search space which cannot be reached by BFM. Properties 3.1-3.3 lay out that the LR-Mijn operator performs more long jumps than Mijn [12,14], which leads to a better convergence of an Evolutionary Algorithm based on the LR-Mijn in comparison with an algorithm based on the Mijn operator. This statement was be verified through experimental results in [14]. In the following section is presented an Evolutionary Reinforcement Scheme which is based only on selection and LR-Mijn 4 A new evolutionary reinforcement scheme We denote with K the dimension of the population. The proposed learning algorithm, which ensures the update of the probability vector of the stochastic automaton, is described in the following: ISBN: 978-960-6766-85-5 270 ISSN: 1790-5109
  • 4. 12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 p[] function Learning-LR-Mijn(action) begin t = 0; - initialize randomly population P(t) with K elements; - evaluate P (t) by using fitness function; while not Terminated for j = 1 to K-1 do - select randomly one element among the best T% from P(t); - mutate it using LR-Mijn, performing a mutation within the gene which contains the probability of the action selected by the automaton; - insert the new chromosome into the population P’(t); end for - choose the best element from P(t) and insert it into P’(t); - evaluate P’(t) using the fitness function; P(t+1) = P’(t); t = t + 1; end while - select the best chromosome from population P(t); - decode from that chromosome the real values representing the probabilities associated with the actions of the automaton; - return the obtained probabilities vector p[]; end In the following, we denote with: • N - the number of actions of the corresponding automaton; • k pn - the probability of action n , stored within the chromosome k ; • β n - the environment response for action n . We define the fitness function as follows: N k n p k f β − =Σ= f :{1,..,K}→ℜ, ( ) ( 1 ) n n 1 N ∈ ∀ = Σ= The relations p k { K} n k n 1, 1,.., 1 must be verified after application of the LR-Mijn on any chromosome k from population. In order to accomplish this, we proceed as follows: for the variable of index i (preferably unaltered by the LR-Mijn), is calculated the sum Σ ≠ = = N i n n k n S k p 1 . The probability vector is then modified as follows: If S k >1 then k n * , 1,.., := 1 ∈ k = 0 pi p n { N} {i} p k S k n else k k pi =1− S . 5 Using stochastic learning automata for Intelligent Vehicle Control In this section, we present a method for intelligent vehicle control, having as theoretical background Stochastic Learning Automata. The aim here is to design an automata system that can learn the best possible action based on the data received from on-board sensors, of from roadside-to-vehicle communications. For our model, we assume that an intelligent vehicle is capable of two sets of lateral and longitudinal actions. Lateral actions are LEFT (shift to left lane), RIGHT (shift to right lane) and LINE_OK (stay in current lane). Longitudinal actions are ACC (accelerate), DEC (decelerate) and SPEED_OK (keep current speed). An autonomous vehicle must be able to “sense” the environment around itself. Therefore, we assume that there are four different sensors modules on board the vehicle (the headway module, two side modules and a speed module), in order to detect the presence of a vehicle traveling in front of the vehicle or in the immediately adjacent lane and to know the current speed of the vehicle. These sensor modules evaluate the information received from the on-board sensors or from the highway infrastructure in the light of the current automata actions, and send a response to the automata. The response from physical environment is a combination of outputs from the sensor modules. Because an input parameter for the decision blocks is the action chosen by the stochastic automaton, is necessary to use two distinct functions for mapping the outputs of decision blocks in inputs for the two learning automata, namely the longitudinal automaton and respectively the lateral automaton. After updating the action probability vectors in both learning automata, using the evolutionary reinforcement scheme presented in section 4, the outputs from stochastic automata are transmitted to the regulation layer. The regulation layer handles the actions received from the two automata in a distinct manner, using for each of them a regulation buffer. If an action received was rewarded, it will be introduced in the regulation buffer of the corresponding automaton, else in buffer will be introduced a certain value which denotes a penalized action by the physical environment. The regulation layer does not carry out the action chosen immediately; instead, it carries out an action only if it is recommended k times consecutively by the automaton, where k is the length of the regulation buffer. After an action is executed, the action probability vector is ISBN: 978-960-6766-85-5 271 ISSN: 1790-5109
  • 5. initialized to 12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 1 , where r is the number of actions. When r an action is executed, regulation buffer is initialized also. 6 Sensor modules The four teacher modules mentioned above are decision blocks that calculate the response (reward/penalty), based on the last chosen action of automaton. Table 1 describes the output of decision blocks for side sensors. As seen in table 1, a penalty response is received from the left sensor module when the action is LEFT and there is a vehicle in the left or the vehicle is already traveling on the leftmost lane. There is a similar situation for the right sensor module. Left/Right Sensor Module Actions Vehicle in sensor range or no adjacent lane No vehicle in sensor range and adjacent lane exists LINE_OK 0/0 0/0 LEFT 1/0 0/0 RIGHT 0/1 0/0 Table 1 Outputs from the Left/Right Sensor Module The Headway (Frontal) Module is defined as shown in table 2. If there is a vehicle at a close distance (< admissible distance), a penalty response is sent to the automaton for actions LINE_OK, SPEED_OK and ACC. All other actions (LEFT, RIGHT, DEC) are encouraged, because they may serve to avoid a collision. Headway Sensor Module Actions Vehicle in range (at a close frontal distance) No vehicle in range LINE_OK 1 0 LEFT 0 0 RIGHT 0 0 SPEED_OK 1 0 ACC 1 0 DEC 0* 0 Table 2 Outputs from the Headway Module The Speed Module compares the actual speed with the desired speed, and based on the action chosen send a feedback to the longitudinal automaton. The reward response indicated by 0* (from the Headway Sensor Module) is different than the normal reward response, indicated by 0: this reward response has a higher priority and must override a possible penalty from other modules. Speed Sensor Module Actions Speed: too slow Acceptable speed Speed: too fast SPEED_OK 1 0 1 ACC 0 0 1 DEC 1 0 0 Table 3 Outputs from the Speed Module 7 Implementation of a simulator In this section is described an implementation of a simulator for the Intelligent Vehicle Control System. The entire system was implemented in Java, and is based on JADE platform. JADE is a middleware that facilitates the development of multi-agent systems and applications conforming to FIPA standards for intelligent agents. In figure 1 is showed the class diagram of the simulator. Each vehicle has associated an agent, responsible for the intelligent control. The response of the physical environment is a combination of the outputs of all four sensor modules. The implementation of this combination for each automaton (longitudinal respectively lateral) is showed in figure 2 (the value 0* was substituted by 2). A snapshot of the running simulator is shown in figure 3. Fig. 1 The class diagram of the simulator // environment response for Longitudinal Automaton public double reward(int action){ int combine; combine = Math.max(speedModule(action), frontModule(action)); ISBN: 978-960-6766-85-5 272 ISSN: 1790-5109
  • 6. 12th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 23-25, 2008 if (combine = = 2) combine = 0; return combine; } // environment response for Lateral Automaton public double reward(int action){ int combine; combine = Math.max(leftRightModule(action), frontModule(action)); return combine; } Fig. 2 The physical environment response Fig. 3 A scenario executed in the simulator 8 Conclusion Reinforcement learning has attracted rapidly increasing interest in the machine learning and artificial intelligence communities. Its promise is beguiling - a way of programming agents by reward and punishment without needing to specify how the task (i.e., behavior) is to be achieved. Reinforcement learning allows, at least in principle, to bypass the problems of building an explicit model of the behavior to be synthesized and its counterpart, a meaningful learning base (supervised learning). The evolutionary reinforcement scheme presented in this paper is suitable for nonstationary stochastic automata environment, resulting from a changing physical environment. Used within a simulator of an Intelligent Vehicle Control System, this new reinforcement scheme has proved its efficiency. References: [1] A. Barto, S. Mahadevan, Recent advances in hierarchical reinforcement learning, Discrete-Event Systems journal, Special issue on Reinforcement Learning, 2003. [2] R. Sutton, A. Barto, Reinforcement learning: An introduction, MIT-press, Cambridge, MA, 1998. [3] O. Buffet, A. Dutech, F. Charpillet, Incremental reinforcement learning for designing multi-agent systems, In J. P. Müller, E. Andre, S. Sen, and C. Frasson, editors, Proceedings of the Fifth International Conference onAutonomous Agents, pp. 31–32,Montreal, Canada, 2001. ACM Press. [4] J. Moody, Y. Liu, M. Saffell, K. Youn, Stochastic direct reinforcement: Application to simple games with recurrence, In Proceedings of Artificial Multiagent Learning. Papers from the 2004 AAAI Fall Symposium,Technical Report FS-04-02, 2004. [5] C. Ünsal, P. Kachroo, J. S. Bay, Simulation Study of Multiple Intelligent Vehicle Control using Stochastic Learning Automata, TRANSACTIONS, the Quarterly Archival Journal of the Society for Computer Simulation International, volume 14, number 4, December 1997. [6] C. Ünsal, P. Kachroo, J. S. Bay, Multiple Stochastic Learning Automata for Vehicle Path Control in an Automated Highway System, IEEE Transactions on Systems, Man, and Cybernetics -part A: systems and humans, vol. 29, no. 1, january 1999 [7] K. S. Narendra, M. A. L. Thathachar, Learning Automata: an introduction, Prentice-Hall, 1989. [8] S. Lakshmivarahan, M.A.L. Thathachar, Absolutely Expedient Learning Algorithms for Stochastic Automata, IEEE Transactions on Systems, Man and Cybernetics, vol. SMC-6, pp. 281-286, 1973 [9] F. Stoica, E. M. Popa, An Absolutely Expedient Learning Algorithm for Stochastic Automata, WSEAS Transactions on Computers, Issue 2, Volume 6, February 2007, ISSN 1109-2750, pp. 229-235 [10] I. De Falco, A. Iazzetta, A. Della Cioppa, E. Tarantino, Mijn Mutation Operator for Aerofoil Design Optimisation, Research Institute on Parallel Information Systems, National Research Council of Italy, 2001 [11] I. De Falco, R. Del Balio, A. Della Cioppa, E. Tarantino, A Comparative Analysis of Evolutionary Algorithms for Function Optimisation, Research Institute on Parallel Information Systems, National Research Council of Italy, 1998 [12] I. De Falco, A. Iazzetta, A. Della Cioppa, E. Tarantino, The Effectiveness of Co-mutation in Evolutionary Algorithms: the Mijn operator, Research Institute on Parallel Information Systems, National Research Council of Italy, 2000 [13] H. Kita, A comparison study of self-adaptation in evolution strategies and real-code genetic algorithms, Evolutionary Computation, 9(2):223-241, 2001 [14] F. Stoica, D. Simian, C. Simian, A new co-mutation genetic operator, In Proceedings of Applications of Evolutionary Computing in Modeling and Development of Intelligent Systems, within the 9th WSEAS International Conference on EVOLUTIONARY COMPUTING (EC '08), May 2008, Sofia, Bulgaria. ISBN: 978-960-6766-85-5 273 ISSN: 1790-5109