SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
International Journal of Informatics and Communication Technology (IJ-ICT)
Vol. 6, No. 2, August 2017, pp. 105~116
ISSN: 2252-8776, DOI: 10.11591/ijict.v6i2.pp105-116  105
Journal homepage: http://iaesjournal.com/online/index.php/IJICT
A Markov Decision Model for Area Coverage in Autonomous
Demining Robot
Abdelhadi Larach*, Cherki Daoui, Mohamed Baslam
Laboratory of Information Processing and Decision Support, Sultan Moulay Slimane University Faculty of sciences and
Techniques, Beni-Mellal, Morocco
Article Info ABSTRACT
Article history:
Received Feb 16, 2017
Revised Jun 29, 2017
Accepted Jul 19, 2017
A review of literature shows that there is a variety of works studying
coverage path planning in several autonomous robotic applications. In this
work, we propose a new approach using Markov Decision Process to plan an
optimum path to reach the general goal of exploring an unknown
environment containing buried mines. This approach, called Goals to Goals
Area Coverage on-line Algorithm, is based on a decomposition of the state
space into smaller regions whose states are considered as goals with the same
reward value, the reward value is decremented from one region to another
according to the desired search mode. The numerical simulations show that
our approach is promising for minimizing the necessary cost-energy to cover
the entire area.
Keywords:
Coverage Path Planning
Markov Decision Process
Robotic Path planning
Shortest Path Planning
Copyright © 2017 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Abdelhadi Larach,
Laboratory of Information Processing and Decision Support,
Sultan Moulay Slimane University,
Faculty of sciences and Techniques, Beni-Mellal, Morocco.
Email: larachabdelhadi@gmail.com
1. INTRODUCTION
Shortest Path Planning (SPP) or Coverage Path Planning (CPP) is a task used in a large number of
robotic applications, such as demining robots [1], painter robots [2], cleaning robots [3], etc. Several
researches concerning SPP and CPP are presented in [4-6], SPP or CPP algorithms are classified in two
categories: off-line algorithms, generally used in acknowledged environment and on-line algorithms, used in
unrecognized environment. In on-line algorithms the policy is generally updated according to new
environment observation. The CPP problem remains subject of research optimization, especially in an
unknown environment.
In Robotic landmines detection, the agent must detect and find location of all possible mines, avoid
all obstacles and follow the shortest path in an unknown environment without overlapping paths. Satisfying
such requirements is not always easy and possible. In fact, the whole structure of the environment is not
known a priori and the on-line algorithm must be able to seek the optimal strategy according to the
knowledge acquired after each observation. E.Galseran [6] presented a survey on CPP in Robotic, several
methods were proposed.
In this paper, we present a Discounted Markov Decision Model for robotics navigation in grid
environment with a theoretical study which permit to propose a new approach for area coverage called Goals
to Goals Area Coverage on-line Algorithm based on a decomposition of the state space into smaller regions
whose all states are considered as goals and assigned with the same reward value. The reward value can be
decreased from one region to another according to the desired search mode such as a line-sweep [13] or
spatial cell diffusion [14] approaches.
 ISSN: 2252-8776
IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116
106
2. MARKOV DECISION PROCESS
Markov Decision Processes are defined as controlled stochastic processes satisfying the Markov
property and assigning reward values to state transitions [7, 8]. Formally, they are defined by the five-tuple
(S, A, T, P, R), where S is the state space in which the process’s evolution takes place; A is the set of all
possible actions which control the state dynamics; T is the set of time steps where decisions need to be made;
P denotes the state transition probability function where P(St+1=j|St=i, At=a)=Piaj is the probability of
transitioning to a state j when an action a is executed in a state i, St (At) is a variable indicating the state
(action) at time t; R provides the reward function defined on state transitions where Ria denotes the reward
obtained if the action a is applied in state i.
2.1. Discounted Reward Markov Decision Process
Let 𝑃𝜋(𝑆𝑡 = 𝑗, 𝐴 𝑡 = 𝑎| 𝑆0 = 𝑖) be the conditional probability that at time t the system is in state j and
the action taken is a, given that the initial state is i and the decision maker is a strategy π; if 𝑅𝑡 denotes the
reward at time t, then for any strategy 𝜋 and initial state i, the expectation of 𝑅𝑡 is given by:
𝔼π(Rt, i) =∑ Pπ(St=j,At=a|S0=i)Rjaj∈S
a∈A(j)
(1)
In discounted reward MDP, the value function, which is the expected reward when the process starts
with state i and using the policy 𝜋 is defined by:
Vπ
α
(i) = 𝔼[∑ αt
𝔼π(Rt , i)∞
t=0 ], i ∈ S (2)
where  ∈ ]0, 1[ is the discount factor.
The objective is to determine 𝑉∗
, the maximum expected total discounted reward vector over an
infinite horizon. It is well known [7, 8] that 𝑉∗
satisfies the Bellman equation:
𝑉(𝑖) = 𝑚𝑎𝑥
𝑎∈𝐴(𝑖)
{𝑅𝑖𝑎 + 𝛼 ∑ 𝑃𝑖𝑎𝑗 𝑉(𝑗)𝑗∈𝑆 } , 𝑖 ∈ 𝑆 (3)
Moreover, the actions attaining the maximum in (3) give rise to an optimal pure policy π^* given by:
𝜋∗(𝑖) ∈ 𝑎𝑟𝑔𝑚𝑎𝑥
𝑎∈𝐴(𝑖)
{𝑅𝑖𝑎 + 𝛼 ∑ 𝑃𝑖𝑎𝑗 𝑉∗(𝑗)𝑗∈𝑆 }, 𝑖 ∈ 𝑆 (4)
2.2. Gauss-Seidel Value Iteration Algorithm
Gauss-Seidel Value Iteration (GSVI) Algorithm is one of the most iterative algorithms used for
finding optimal or approximately optimal policies under Discounted MDP [8-10]. In this paragraph, we
presente the following optimized pseudo-code ofGSVI Algorithm (Algorithm 1).
Algorithm 1. GSVI Algorithm.
GSVI (In: S, P, A, R, 𝛤𝑎
+
, 𝛼, 𝜀; Out: 𝑉∗
, ∗
)
1. For all 𝑖 ∈ S Do 𝑉∗
(i)←0; //Initialization
2. Bellman_err ← 2𝜀; //For stopping criterion
3. While (Bellman_err ≥ 𝜀) Do
For all i ∈ S Do //Value improvement
𝑉𝑡𝑚𝑝 ← 𝑚𝑎𝑥
𝑎∈𝐴(𝑖)
{𝑅𝑖𝑎 + 𝛼 ∑ 𝑃𝑖𝑎𝑗 𝑉∗(𝑗)
𝑗∈𝛤𝑎
+(𝑖)
}
Bellman_err ← Max( |V*
(i) - 𝑉𝑡𝑚𝑝|, Bellman_err)
𝑉∗(𝑖) ← 𝑉𝑡𝑚𝑝
4. For all 𝑖 ∈ 𝑆 Do //Policy calculation
∗(𝑖) ← 𝑎𝑟𝑔𝑚𝑎𝑥
𝑎∈𝐴(𝑖)
{𝑅𝑖𝑎 + 𝛼 ∑ 𝑃𝑖𝑎𝑗 𝑉∗( 𝑗)
𝑗∈𝛤𝑎
+(𝑖)
}
5. Return 𝑉∗
, 𝜋∗
IJ-ICT ISSN: 2252-8776 
A Markov Decision Model for Area Coverage in Autonomous Demining Robot (Abdelhadi Larach)
107
We denote by 𝛤𝑎
+(𝑖) = { 𝑗 ∈ 𝑆 : 𝑃𝑖𝑎𝑗 > 0} the successors list of pair (i, a),𝑖 ∈ 𝑆, 𝑎 ∈ 𝐴(𝑖).
The algorithm 1 is based on a successors list of pair state-action, which permits to accelerate
iteration compared to the classical GSVI Algorithm especially when the number of actions and successors
per state is very less than the number of states. Indeed, the complexity of the proposed version is reduced to
𝒪(|𝛤 𝑎
+
||S|) per iteration where |𝛤 𝑎
+
| is the average number of state-action successors.
3. MARKOV DECISION MODEL FOR ROBOTIC NAVIGATION
To model the Robotic Navigation in general or Demining Robot problem in particular case using a
MDP, the five-tuple (S, A, T, P, R) must be defined and the environment representation must be choosen.
a. Grid’s Environment: The Grid Based method -ideal for landmines detection- is used to
model the environment which is entirely discretized according to a regular grid. The grid
size can be chosen according to the robot structure and the field covered by the robot
sensor.
b. States Space: Using the Grid method, the state space is therefore a set of grids, each grid
cell has an associated value stating, obstacle, mine, free or goal state.
c. Actions Space: The robot can be controlled through nine actions: the eight compass
directions and the action designed by θ that keeps the process in the state where it is;
actions that move the robot to an obstacle or a mine state are eliminated; in a goal state the
possible action is θ. Figure 1 shows the possible actions in an environment example, the
black grid is an obstacle state and the green grid is a goal state.
Figure 1. Environment example and possible robot actions in each state
Reward Function: A transition to a free or goal state is characterized by a cost of energy
proportional to the distance travelled, it is equal to a predefined constant x if the action is diagonal, and
𝑥
√2
if
the action is horizontal or vertical, the reward value is therefore equal to -x or
−𝑥
√2
; the reward assigned to the
action 𝜃 in a free state is equal to zero and for a goal state it is equal to a predefined constant Rb.
Transition Function: The transition function defines the uncertainty due to the effects of actions; it
is a data of problem and can be determined by reinforcement learning.
Robot Sensor: Several landmines detection methods can be used such as Metal Detector
Technologies, Electromagnetic Methods [11], etc. In this work, we suppose that the robot can detect the
buried mines existing in near states (Figure 2) by using his owner sensing system, an arm movement or
multi-sensors technologie would cover these states.
 ISSN: 2252-8776
IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116
108
Figure 2. States covered by the robot sensor
Robot Structure: several robot mechanical structures can be used in demining robot such as
Omnidirectional or Multi Directional Conductive control [12], etc.
Remark 1. Using the proposed model, the time complexity of algorithm 1 is 𝒪(|𝛤 𝑎
+
||S’|) per
iteration, where |S’| is the number of free states; for a goal state the expected reward is a constant value equal
to
𝑅 𝑏
1−𝛼
(given by (3)) since the only possible action that can be taken in a goal state is 𝜃; |𝛤 𝑎
+
| is very less than
|S’| and can be considered us constant; so the Algorithm 1 is linear per iteration.
4. THEORETICAL MODEL STUDY
In this section, we present a theoretical study for the considered model, which is the basis of our
approach.
Proposition 1. If the state space do not contains any goal state then for any free state 𝑠0, 𝜋∗(𝑠0) =
𝜃, 𝜋∗
is an optimal strategy.
Proof. Suppose ∃ an optimal pure strategy 𝜋∗
and ∃ 𝑠0 ∈ 𝑆 such that: 𝜋∗(𝑠0) ≠ 𝜃; the expected reward when
the system start at state 𝑠0 is:
𝑉𝜋∗
𝛼
(𝑠0) = 𝑉∗(𝑠0) = 𝔼 𝜋∗(𝑅0 , 𝑠0) + 𝔼[∑ 𝛼 𝑡
𝔼 𝜋∗(𝑅𝑡 , 𝑠0)∞
𝑡=1 ] (5)
The expectation of 𝑅𝑡 time t=0 and using strategy 𝜋∗
is given by:
𝔼 𝜋∗(𝑅0 , 𝑠0) = ∑ 𝑃 𝜋∗(𝑆0 = 𝑠0, 𝐴0 = 𝑎|𝑆0 = 𝑠0)𝑅𝑠0 𝑎𝑗∈𝑆
𝑎∈𝐴(𝑗)
= 𝑅𝑠0 𝑎 (6)
We have: 𝑅𝑠0 𝑎 ≤
−𝑥
√2
then Vπ∗
α
(s0) ≤
−x
√2
< 0 and the fact that 𝑉𝜋∗
𝛼
(𝑠0) < 0 implies that 𝜋∗
(𝑠0) cannot
be an optimal action since there exists a strategy 𝜋′
: 𝜋′
(𝑠0) = 𝜃 and 𝑉𝜋′
𝛼
(𝑠0) = 0, which contradicts the
supposition.
Figure 3 (left) shows a simulation result in an environment example where no goal state is defined;
as we can see, ∀ 𝑠0 ∈ 𝑆, 𝜋∗
(𝑠0) = 𝜃.
Proposition 2. Let 𝑠0 be a free initial state, if there is no path leading from s0 to the goal state G
then 𝜋∗
(𝑠0) = 𝜃.
Proof. The proof is similar to the proof of Proposition 1, indeed, for any strategy 𝜋 such that 𝜋(𝑠0) ≠ 𝜃,
𝑉𝜋
𝛼(𝑠0) < 0.
Figure 3(right) shows an example of simulation result, as we can see for all 𝑠0 ∈ 𝑆 where there is no
path to the goal state G, 𝜋∗
(𝑠0) = 𝜃.
IJ-ICT ISSN: 2252-8776 
A Markov Decision Model for Area Coverage in Autonomous Demining Robot (Abdelhadi Larach)
109
Figure 3. Optimal strategies examples (no goal state (left) and no path to the goal state for some states (right))
Proposition 3. Let G be a goal state with reward value Rb=0, then for any initial free state 𝑠0, 𝜋∗(𝑠0) = 𝜃.
Proof. It is clear since for any strategy 𝜋 such that 𝜋(𝑠0) ≠ 𝜃, 𝑉𝜋
𝛼(𝑠0) < 0.
Figure 4 shows a simulation example in an environment where there exist four goals states with
reward value Rb=0, as it can be seenπ∗(s0) = θ for all free initial state.
Figure 4. Optimal strategy in an environment example with four goals states within reward value Rb=0
Proposition 4. Let 𝑠0 be a free initial state, suppose that there exist a path of length l, from 𝑠0 to the gaol
state G and let 𝑅 𝑏 be the reward value obtained when the action 𝜃 is applied in a goal state G.
𝐼𝑓 𝑅 𝑏 > (
𝑥
𝛼 𝑙 − 𝑥) 𝑡ℎ𝑒𝑛 𝜋∗(𝑠0) ≠ 𝜃 (7)
Proof. Let 𝑉∗
(𝑠0) be the expected reward when the process start at state 𝑠0.
𝑉∗
(𝑠0) = ∑ 𝛼 𝑡
𝑅𝑡
∞
𝑡=0 = ∑ 𝛼 𝑡
𝑅𝑓
𝑡𝑙−1
𝑡=0 + ∑ 𝛼 𝑡
𝑅 𝑏
∞
𝑡=𝑙 (8)
where: 𝑅 𝑓
𝑡
= 𝐸 𝜋∗( 𝑅𝑡 , 𝑠0) = ∑ 𝑃 𝜋∗(𝑆𝑡 = 𝑗, 𝐴𝑡 = 𝑎|𝑆0 = 𝑠0)𝑅𝑗𝑎𝑗∈𝑆
𝑎∈𝐴(𝑗)
is the expectation of the reward value
obtained when some compass action is taken in the state St at time At.
The equation (8) can be modified as follow:
𝑉∗
(𝑠0) = ∑ 𝛼 𝑡
𝑅𝑓
𝑡𝑙−1
𝑡=0 + 𝛼 𝑙 ∑ 𝛼 𝑡
𝑅 𝑏
∞
𝑡=0 (9)
Let V∗
(sg) be the expected reward when the process starts at goal state G, by using (3), we have:
𝑉∗
(𝑠𝑔) = ∑ 𝛼 𝑡
𝑅 𝑏
∞
𝑡=0 =
𝑅 𝑏
1−𝛼
(10)
Using (10), the equality (9) can be reformed as follow:
𝑉∗
(𝑠0) = ∑ 𝛼 𝑡
𝑅𝑓
𝑡𝑙−1
𝑡=0 +
𝑅 𝑏×𝛼 𝑙
1−𝛼
(11)
The fact that: 𝑅𝑗𝑎 ≥ −𝑥 implies that 𝑅𝑓
𝑡
≥ −𝑥, then:
 ISSN: 2252-8776
IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116
110
∑ 𝛼 𝑡
𝑅𝑓
𝑡𝑙−1
𝑡=0 ≥ −𝑥 ∑ 𝛼 𝑡𝑙−1
𝑡=0 (12)
By applying (12) in (11), we have the resulting equation:
𝑉∗
(𝑠0) ≥
𝑅 𝑏×𝛼 𝑙
1−𝛼
− 𝑥(∑ 𝛼 𝑡𝑙−1
𝑡=0 ) (13)
The fact that ∑ 𝛼 𝑡𝑙−1
𝑡=0 =
1−𝛼 𝑙
1−𝛼
, equation (13) becomes:
𝑉∗
(𝑠0) ≥
𝑅 𝑏×𝛼 𝑙−𝑥+𝑥×𝛼 𝑙
1−𝛼
(14)
The fact that Rb > (
x
α𝐥 − x) implies that Rb × αl
− x + x × αl
> 0.
Equations (14) imply that V∗
(s0) is great than zero.
𝑉∗
(𝑠0) ≥
𝑅 𝑏×𝛼 𝑙−𝑥+𝑥×𝛼 𝑙
1−𝛼
> 0 (15)
The fact that 𝑉∗
(𝑠0) > 0 implies that 𝜋∗
(𝑠0) ≠ 𝜃.
Figure 5 shows a strategy example generated using algorithm 1 with parameters: Rb=200, =0.2
and 𝑥 = 1; for a path length l=3, 𝒍3 =
1
𝛼3 − 1 = 125 < 𝑅 𝑏 = 200; and for l=4, 𝒍4 =
1
𝛼4 − 1 = 624 > 𝑅 𝑏 =
200, we can conclude from the values of 𝒍3 and 𝒍4 that: If l < 4 then 𝜋∗
(𝑠0) ≠ 𝜃.
Figure 5. An optimal strategy example using parameters: α=0.2, x=1 and Rb=200
Proposition 5. Let G be a goal state with reward value 𝑅 𝑏 =
𝑥
𝛼|𝑆|. For any initial free state 𝑠0, if there exists a
path to the goal state G then 𝜋∗
(𝑠0) ≠ 𝜃.
Proof. Let l be the path length to the goal state G; using proposition 4, if 𝑅 𝑏 > (
𝑥
𝛼 𝒍 − 𝑥) then 𝜋∗
(𝑠0) ≠ 𝜃.
Let show that
𝑥
𝛼|𝑆| > (
𝑥
𝛼 𝒍 − 𝑥), we have |S| > 𝑙 then 𝛼|𝑆|
< 𝛼 𝑙
and
𝑥
𝛼|𝑆| >
𝑥
𝛼 𝒍 >
𝑥
𝛼 𝑙 − 𝑥, since 𝑥 > 0.
Figure 6 shows a path strategy example with parameters: α=0.2, x=α|S|
and Rb=1 (note that if (x=α|S|
) then
Rb=1); as it can be seen, 𝜋∗
(𝑠0) ≠ 𝜃 for all free initial state 𝑠0.
IJ-ICT ISSN: 2252-8776 
A Markov Decision Model for Area Coverage in Autonomous Demining Robot (Abdelhadi Larach)
111
Figure 6. An optimal strategy example using parameters: α=0.2, x=α|S|
and Rb=1
Proposition 6. Let G1 (G2) be a goal state such that 𝑅 𝑏
1
= 1 (Rb
2
= 1 +
1
α|S|). Suppose ∃ a path of length 𝑙1
(𝑙2) from initial state 𝑠0 to the goal state G1 (G2), then the optimal strategy navigates robot to the goal state G2
for all 𝑙1 ≥ 1.
Proof. Suppose ∃ an optimal strategy 𝜋1
∗
that moves the robot to the goal state G1. Equation (11) implies that:
𝑉𝜋1
∗
𝛼
(𝑠0) <
𝛼𝑅 𝑏
1
1−𝛼
=
𝛼
1−𝛼
<
1
1−𝛼
(16)
Let 𝜋2 be a strategy that moves robot to the goal state G2, equation (15) and the fact that 𝑙2 < | 𝑆| imply that:
𝑉𝜋2
𝛼 (𝑠0) ≥
𝑅 𝑏
2
×𝛼 𝑙−𝑥+𝑥𝛼 𝑙
1−𝛼
>
𝑅 𝑏
2
×𝛼|𝑆|−𝑥+𝑥𝛼|𝑆|
1−𝛼
(17)
By using the fact that 𝑅 𝑏
2
= 1 +
1
𝛼|𝑆| and 𝑥 = 𝛼|𝑆|
, inequation (17) can be rewritten as:
Vπ2
α (s0) >
(1+
1
α|S|
)α|S|−x+xα|S|
1−α
=
1+α2|S|
1−α
>
1
1−α
(18)
Inequalities (16) and (18) imply that:
𝑉𝜋2
𝛼
(𝑠0) >
1+𝛼2|𝑆|
1−𝛼
>
1
1−𝛼
> 𝑉𝜋1
∗
𝛼
(𝑠0) (19)
The fact that 𝑉𝜋2
𝛼 (𝑠0) > 𝑉𝜋1
∗
𝛼
(𝑠0) implies that π1
∗
cannot be an optimal strategy.
5. GOALS TO GOALS AREA COVERAGE ALGORITHMS
In this section, we present the proposed on-line algorithm for area coverage in autonomous
demining robot based on the model defined in section 3, and the theoretical studies presented in the previous
section.
We begin by the first version (Algorithm 2) called Goals to Goals Randomize Area Coverage, in
which we suppose that all states are goals with the same reward value, except then initial state. Beginning by
the initial position, the robot detect the near states, update detected states, calculate path strategy using
algorithm 1 and move according to an optimal action; these steps are repeated until the optimal action is
equal to θ.
Algorithm 2. Goals to Goals Randomize Area Coverage.
Data: S, P, A, R, 𝛤𝑎
+
, 𝛼, 𝜀; 𝑥 = 𝛼|𝑆|
, s0: initial state
1. Set all states as goals (except initial state) with the same reward: 𝑅 𝑏 = 1.
2. Repeat
2.1. Observe near states with sensor
2.2. Update observed states
2.3. Calculate strategy using algorithm 1
2.4. Move robot using an optimal action
Until (the optimal action is equal to 𝜃)
 ISSN: 2252-8776
IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116
112
Theorem. The algorithm 2 works correctly and terminates after covering the entire area except the non-
reachable regions from the start state s0.
Proof. The proof follows from the propositions 1, 2, 4 and 5. In fact, propositions 1 and 2 imply that
if there is at least one state not explored and reachable from the current robot position, the optimal action is
different to 𝜃. Moreover, after each observation, the status of near states (supposed goals) is updated; the
navigation to the nearest goal is assured by propositions 3 and 4 and the robot stops after exploring the entire
environment except the unreachable states from the start state s0 (proposition 2).
Figure 7 shows a path generated using algorithm 2 in an obstacle free environment (ten randomized
mines positions), as we can see, the entire area is covered, but the search path is pseudo-random and the robot
overlaps some regions previously detected.
Figure 7. Environment example and path generated using algorithm 2
To minimize the overlapping paths, the authors propose a second version (algorithm 3) based on
decreasing the rewards values according some Search Mode such as a line-sweep (Figure 8) based approach
described in [13] or spatial cell diffusion approach presented in [14].
In each smaller region (for example in figure 8, a smaller region contain nine states) all states are
considered as goals with the same fixed reward value and the search mode is assured by the proposition 6.
Figure 8. Example of rewards values decremented according the line-sweep search mode
IJ-ICT ISSN: 2252-8776 
A Markov Decision Model for Area Coverage in Autonomous Demining Robot (Abdelhadi Larach)
113
Algorithm 3. Goals to Goals Search mode Area Coverage
Data: S, P, A, R, 𝛤𝑎
+
, 𝛼, 𝜀; 𝑥 = 𝛼|𝑆|
, s0: initial state
1. Decompose the state space into the p smaller regions.
2. Define the decreasing reward values according to the desired search mode.
3. For each smaller region i = p,….,1 Do Set all states in region i as goals with 𝑹𝒊 = 𝟏 +
𝒊
𝛼|𝑆|.
4. Repeat
4.1. Observe near states
4.2. Update observed states
4.3. Calculate strategy using algorithm 1
4.4. Move robot using an optimal action
Until (the optimal action is equal to 𝜃)
6. SIMULATION RESULTS
The proposed algorithms are simulated using JAVA language implementation; figures 9, 10 and 11
show the simulation results for several search modes, in an obstacle free environment example, with ten
randomized buried mines. It can be seen from figures 9, 10 and 11 that the Algorithm 3 works correctly with
low repetition rate compared to Algorithm 2.
In some case, it is optimal and beneficial that the robot finishes the area coverage near its initial
position; the variant of Line Sweep search mode (Figure 11) can be used.
To evaluate these search modes, we can use the following valuation function:
𝐸 = ∑
𝑅 𝑠 𝑡
𝑥
𝑇
𝑖=0 = ∑ 𝐶𝑠 𝑡
𝑇
𝑖=0 (20)
where 𝐶 𝑠 𝑡
= 1
√2
𝑜𝑟 1 and T is the number of decisions to complete the area coverage; this valuation function is
proportional to the time required for the entire exploration; the number of rotation can be added to this
valuation function.
Table 1. Valuation Functions: Comparison of Several Search Modes
Search mode Ē
Randomize 100
Spatial Cell Diffusion 83
Line Sweep 80
Variant of Line Sweep 82
Table 1 presents the average value of E for each search mode and for several irregular buried mines
locations generated randomly in the same environment example, us it can be seen, these search modes in free
obstacle environment achieves the area coverage with low cost of energy, especially the Line Sweep search
mode which is beneficial in the decomposition method for real environment with obstacles.
Figure 9. Online strategy generated using algorithm 3 for spatial cell diffusion search mode
 ISSN: 2252-8776
IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116
114
Figure 10. Online strategy generated using algorithm 3 for Line-Sweep search mode
Figure 11. Online strategy generated using algorithm 3 for a variant of Line-Sweep search mode
Remark 2. Area coverage without overlapping path is not always easy and possible especialy in an unknown
environment. Figure 12 shows a strategy example with some overlapping paths.
Figure 12. Online strategy example with some overlapping paths
Remark 3. For a large state space and real environment with obstacles, line sweep decomposition into
monotone subregions can be used [13]; in each subregion, an adequate line-sweep search mode is used to
ensure optimal area coverage. The robot explores subregions one by one; each subregion covered can be
changed as goals states with reward 𝑅 𝑏 = 0 , so that there will be no return to the explored region
(Proposition 3) and so, the decision time of Algorithm 1 can be reduced to real time since it is proportional to
the number of free states (Remark 1).
IJ-ICT ISSN: 2252-8776 
A Markov Decision Model for Area Coverage in Autonomous Demining Robot (Abdelhadi Larach)
115
Figure 13 shows an environment example with obstacles divided into two subregions, after
exploring the left first region, the states are transformed to goals with reward value equal to zero; the path
generated is shown in Figure 14.
Figure 13. Environment example with obstacle (blue region) divided into two subregions
Figure 14. Online strategy generated for the environment example in figure 13
Remark 4. For minimizing the memory consumption in large state space, each subregion can be aggregated
to one state after and before exploration, only the active subregion is disaggregated.
7. CONCLUSIONS
In this work, we have presented a Discounted Markov Decision Model for robotic navigation in
grid’s environment by adding a fictitious action in each state witch is a key of the simplest proposed
algorithms, the use of this model is certainly not limited to autonomous robot landmines detection but it can
be used in several robotics applications. We have also presented a theoretical study of Discounted MDP in
the proposed model to ensure an optimal path strategy. This theoretical study permit us to prove the
correctness of the proposed new approach to find a best on-line path strategy for detecting buried mines in a
free obstacle environment where hidden mines are randomly distributed in different positions.
The simulation results show that our approach is encouraging and promising for an extension, in
future work, to different types of environment and to the Partially Observable MDP where the robot position
is stochastic.
 ISSN: 2252-8776
IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116
116
REFERENCES
[1] H. Najjaran and N. Kircanski, “Path Planning for a Terrain Scanner Robot”, in Proc. 31st
Int. Symp. Robotics,
Montereal, QC, Canada 2000, pp. 132-137.
[2] P. Atkar et al, “Uniform coverage of automotive surface patches”, The International Journal of Robotics Research,
24(11), pp.883-898, 2005.
[3] J. S. Oh et al, “Complete Coverage Navigation of Cleaning Robots using Triangular-Cell-Based Map”, IEEE
Transactions on Industrial Electronics, vol. 51, no. 3, pp. 718-726, 2004.
[4] P. Tokekar; N. Karnad; V. Isler, “Energy-Optimal Trajectory Planning for Car-Like Robots,” Autonomous Robots,
vol. 37, no.3, pp.279-300, 2014.
[5] J. W. Kang; S. J. Kim; M. J. Chung, “Path Planning for Complete and Efficient Coverage Operation of Mobile
Robots”, IEEE International Conference on Mechatronics and Automation, pp. 2126-2131, 2007.
[6] E. Galceran; M. Carreras, “A Survey on Coverage Path Planning for Robotics”, Robotics and Autonomous Systems,
61, pp. 1258-1276, 2013.
[7] R. E. Bellman. “Dynamic Programming. Princeton University Press”, Princeton, NJ, 1957.
[8] M. Puterman, “Markov Decision Processes: Discrete Stochastic Dynamic Programming”, John Wiley & Sons, Inc.,
New York, USA, 1994.
[9] D. J. White, “Markov Decision Processes” John Wiley & Sons, Inc, New York, 1994.
[10] D. P. Bertsekas, “Dynamic Programming and Optimal Control,” Belmont: Athena Scientific, 2001.
[11] C. P. Goonerante; S. C. Mukhopahyay; G. Sen Gupta, “A Review of Sensing Technologies for Landmine
Detection: Unmanned Vehicle Based Approach”, in: Proc. 2nd
International Conference on Autonomous Robots and
Agents December 13-15 2004 Palmerston North, New Zealand.
[12] K. Suresh; K. Vidyasagar; A. F. Basha, “Multi Directional Conductive Metal Detection Robot Control”,
International Journal of Computer Applications, volume 109 no.4, 2015.
[13] W. H. Huang, “Optimal Line-Sweep-Based Decompositions for Coverage Algorithms”, in Robotics and
Automation, 2001. Proceedings 2001 ICRA. IEEE International Conference on, vol. 1. IEEE, 2001, pp. 27–32.
[14] S. W. Ryu et al, “A Search and Coverage Algorithm for Mobile Robot”, in Ubiquitous Robots and Ambient
Intelligence (URAI), 2011 8th International Conference on. IEEE, pp. 815–821, 2011.

Weitere ähnliche Inhalte

Was ist angesagt?

Comparison of various heuristic
Comparison of various heuristicComparison of various heuristic
Comparison of various heuristicijaia
 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMIJCSEA Journal
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringAllen Wu
 
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMCOMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMcsitconf
 
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMCOMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMcscpconf
 
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics Computational Materials Science Initiative
 
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...Abimbola Ogundipe
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataJay (Jianqiang) Wang
 
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONA COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONijsc
 
International Journal of Computational Science and Information Technology (...
  International Journal of Computational Science and Information Technology (...  International Journal of Computational Science and Information Technology (...
International Journal of Computational Science and Information Technology (...ijcsity
 
Firefly Algorithm: Recent Advances and Applications
Firefly Algorithm: Recent Advances and ApplicationsFirefly Algorithm: Recent Advances and Applications
Firefly Algorithm: Recent Advances and ApplicationsXin-She Yang
 
Scheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic AlgorithmScheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic Algorithmiosrjce
 
saad faim paper3
saad faim paper3saad faim paper3
saad faim paper3Saad Farooq
 

Was ist angesagt? (18)

Comparison of various heuristic
Comparison of various heuristicComparison of various heuristic
Comparison of various heuristic
 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
 
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMCOMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
 
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMCOMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
 
C42011318
C42011318C42011318
C42011318
 
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
 
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...
 
40120140501004
4012014050100440120140501004
40120140501004
 
Particle Swarm Optimization Algorithm Based Window Function Design
Particle Swarm Optimization Algorithm Based Window Function DesignParticle Swarm Optimization Algorithm Based Window Function Design
Particle Swarm Optimization Algorithm Based Window Function Design
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
 
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONA COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
 
International Journal of Computational Science and Information Technology (...
  International Journal of Computational Science and Information Technology (...  International Journal of Computational Science and Information Technology (...
International Journal of Computational Science and Information Technology (...
 
1308.3898
1308.38981308.3898
1308.3898
 
Firefly Algorithm: Recent Advances and Applications
Firefly Algorithm: Recent Advances and ApplicationsFirefly Algorithm: Recent Advances and Applications
Firefly Algorithm: Recent Advances and Applications
 
Scheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic AlgorithmScheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic Algorithm
 
saad faim paper3
saad faim paper3saad faim paper3
saad faim paper3
 

Ähnlich wie 5. 8519 1-pb

USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...IJCSEA Journal
 
ROBUST MULTISENSOR FRAMEWORK FOR MOBILE ROBOT NAVIGATION IN GNSS-DENIED ENVIR...
ROBUST MULTISENSOR FRAMEWORK FOR MOBILE ROBOT NAVIGATION IN GNSS-DENIED ENVIR...ROBUST MULTISENSOR FRAMEWORK FOR MOBILE ROBOT NAVIGATION IN GNSS-DENIED ENVIR...
ROBUST MULTISENSOR FRAMEWORK FOR MOBILE ROBOT NAVIGATION IN GNSS-DENIED ENVIR...Steffi Keran Rani J
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Optimized Robot Path Planning Using Parallel Genetic Algorithm Based on Visib...
Optimized Robot Path Planning Using Parallel Genetic Algorithm Based on Visib...Optimized Robot Path Planning Using Parallel Genetic Algorithm Based on Visib...
Optimized Robot Path Planning Using Parallel Genetic Algorithm Based on Visib...IJERA Editor
 
A deep reinforcement learning strategy for autonomous robot flocking
A deep reinforcement learning strategy for autonomous robot flockingA deep reinforcement learning strategy for autonomous robot flocking
A deep reinforcement learning strategy for autonomous robot flockingIJECEIAES
 
Location Update In Mobile Ad Hoc Network Using Markov Model
Location Update In Mobile Ad Hoc Network Using Markov ModelLocation Update In Mobile Ad Hoc Network Using Markov Model
Location Update In Mobile Ad Hoc Network Using Markov ModelEditor IJMTER
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
 
Motion planning and controlling algorithm for grasping and manipulating movin...
Motion planning and controlling algorithm for grasping and manipulating movin...Motion planning and controlling algorithm for grasping and manipulating movin...
Motion planning and controlling algorithm for grasping and manipulating movin...ijscai
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigationguest90654fd
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigationguest90654fd
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNINGpradiprahul
 
Fuzzy Genetic Algorithm Approach for Verification of Reachability and Detect...
Fuzzy Genetic Algorithm Approach for Verification  of Reachability and Detect...Fuzzy Genetic Algorithm Approach for Verification  of Reachability and Detect...
Fuzzy Genetic Algorithm Approach for Verification of Reachability and Detect...Dr. Amir Mosavi, PhD., P.Eng.
 
Using Genetic Algorithm for Shortest Path Selection with Real Time Traffic Flow
Using Genetic Algorithm for Shortest Path Selection with Real Time Traffic FlowUsing Genetic Algorithm for Shortest Path Selection with Real Time Traffic Flow
Using Genetic Algorithm for Shortest Path Selection with Real Time Traffic FlowIJCSIS Research Publications
 
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...University Utara Malaysia
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
 
Iaetsd modified artificial potential fields algorithm for mobile robot path ...
Iaetsd modified  artificial potential fields algorithm for mobile robot path ...Iaetsd modified  artificial potential fields algorithm for mobile robot path ...
Iaetsd modified artificial potential fields algorithm for mobile robot path ...Iaetsd Iaetsd
 

Ähnlich wie 5. 8519 1-pb (20)

USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
 
ROBUST MULTISENSOR FRAMEWORK FOR MOBILE ROBOT NAVIGATION IN GNSS-DENIED ENVIR...
ROBUST MULTISENSOR FRAMEWORK FOR MOBILE ROBOT NAVIGATION IN GNSS-DENIED ENVIR...ROBUST MULTISENSOR FRAMEWORK FOR MOBILE ROBOT NAVIGATION IN GNSS-DENIED ENVIR...
ROBUST MULTISENSOR FRAMEWORK FOR MOBILE ROBOT NAVIGATION IN GNSS-DENIED ENVIR...
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Optimized Robot Path Planning Using Parallel Genetic Algorithm Based on Visib...
Optimized Robot Path Planning Using Parallel Genetic Algorithm Based on Visib...Optimized Robot Path Planning Using Parallel Genetic Algorithm Based on Visib...
Optimized Robot Path Planning Using Parallel Genetic Algorithm Based on Visib...
 
frozen_lake_rl_report.pdf
frozen_lake_rl_report.pdffrozen_lake_rl_report.pdf
frozen_lake_rl_report.pdf
 
A deep reinforcement learning strategy for autonomous robot flocking
A deep reinforcement learning strategy for autonomous robot flockingA deep reinforcement learning strategy for autonomous robot flocking
A deep reinforcement learning strategy for autonomous robot flocking
 
Location Update In Mobile Ad Hoc Network Using Markov Model
Location Update In Mobile Ad Hoc Network Using Markov ModelLocation Update In Mobile Ad Hoc Network Using Markov Model
Location Update In Mobile Ad Hoc Network Using Markov Model
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
 
Motion planning and controlling algorithm for grasping and manipulating movin...
Motion planning and controlling algorithm for grasping and manipulating movin...Motion planning and controlling algorithm for grasping and manipulating movin...
Motion planning and controlling algorithm for grasping and manipulating movin...
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
 
50120130405020
5012013040502050120130405020
50120130405020
 
Ica 2013021816274759
Ica 2013021816274759Ica 2013021816274759
Ica 2013021816274759
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNING
 
Fuzzy Genetic Algorithm Approach for Verification of Reachability and Detect...
Fuzzy Genetic Algorithm Approach for Verification  of Reachability and Detect...Fuzzy Genetic Algorithm Approach for Verification  of Reachability and Detect...
Fuzzy Genetic Algorithm Approach for Verification of Reachability and Detect...
 
Using Genetic Algorithm for Shortest Path Selection with Real Time Traffic Flow
Using Genetic Algorithm for Shortest Path Selection with Real Time Traffic FlowUsing Genetic Algorithm for Shortest Path Selection with Real Time Traffic Flow
Using Genetic Algorithm for Shortest Path Selection with Real Time Traffic Flow
 
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
 
Iaetsd modified artificial potential fields algorithm for mobile robot path ...
Iaetsd modified  artificial potential fields algorithm for mobile robot path ...Iaetsd modified  artificial potential fields algorithm for mobile robot path ...
Iaetsd modified artificial potential fields algorithm for mobile robot path ...
 

Mehr von IAESIJEECS

08 20314 electronic doorbell...
08 20314 electronic doorbell...08 20314 electronic doorbell...
08 20314 electronic doorbell...IAESIJEECS
 
07 20278 augmented reality...
07 20278 augmented reality...07 20278 augmented reality...
07 20278 augmented reality...IAESIJEECS
 
06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...IAESIJEECS
 
05 20275 computational solution...
05 20275 computational solution...05 20275 computational solution...
05 20275 computational solution...IAESIJEECS
 
04 20268 power loss reduction ...
04 20268 power loss reduction ...04 20268 power loss reduction ...
04 20268 power loss reduction ...IAESIJEECS
 
03 20237 arduino based gas-
03 20237 arduino based gas-03 20237 arduino based gas-
03 20237 arduino based gas-IAESIJEECS
 
02 20274 improved ichi square...
02 20274 improved ichi square...02 20274 improved ichi square...
02 20274 improved ichi square...IAESIJEECS
 
01 20264 diminution of real power...
01 20264 diminution of real power...01 20264 diminution of real power...
01 20264 diminution of real power...IAESIJEECS
 
08 20272 academic insight on application
08 20272 academic insight on application08 20272 academic insight on application
08 20272 academic insight on applicationIAESIJEECS
 
07 20252 cloud computing survey
07 20252 cloud computing survey07 20252 cloud computing survey
07 20252 cloud computing surveyIAESIJEECS
 
06 20273 37746-1-ed
06 20273 37746-1-ed06 20273 37746-1-ed
06 20273 37746-1-edIAESIJEECS
 
05 20261 real power loss reduction
05 20261 real power loss reduction05 20261 real power loss reduction
05 20261 real power loss reductionIAESIJEECS
 
04 20259 real power loss
04 20259 real power loss04 20259 real power loss
04 20259 real power lossIAESIJEECS
 
03 20270 true power loss reduction
03 20270 true power loss reduction03 20270 true power loss reduction
03 20270 true power loss reductionIAESIJEECS
 
02 15034 neural network
02 15034 neural network02 15034 neural network
02 15034 neural networkIAESIJEECS
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancementIAESIJEECS
 
08 17079 ijict
08 17079 ijict08 17079 ijict
08 17079 ijictIAESIJEECS
 
07 20269 ijict
07 20269 ijict07 20269 ijict
07 20269 ijictIAESIJEECS
 
06 10154 ijict
06 10154 ijict06 10154 ijict
06 10154 ijictIAESIJEECS
 
05 20255 ijict
05 20255 ijict05 20255 ijict
05 20255 ijictIAESIJEECS
 

Mehr von IAESIJEECS (20)

08 20314 electronic doorbell...
08 20314 electronic doorbell...08 20314 electronic doorbell...
08 20314 electronic doorbell...
 
07 20278 augmented reality...
07 20278 augmented reality...07 20278 augmented reality...
07 20278 augmented reality...
 
06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...
 
05 20275 computational solution...
05 20275 computational solution...05 20275 computational solution...
05 20275 computational solution...
 
04 20268 power loss reduction ...
04 20268 power loss reduction ...04 20268 power loss reduction ...
04 20268 power loss reduction ...
 
03 20237 arduino based gas-
03 20237 arduino based gas-03 20237 arduino based gas-
03 20237 arduino based gas-
 
02 20274 improved ichi square...
02 20274 improved ichi square...02 20274 improved ichi square...
02 20274 improved ichi square...
 
01 20264 diminution of real power...
01 20264 diminution of real power...01 20264 diminution of real power...
01 20264 diminution of real power...
 
08 20272 academic insight on application
08 20272 academic insight on application08 20272 academic insight on application
08 20272 academic insight on application
 
07 20252 cloud computing survey
07 20252 cloud computing survey07 20252 cloud computing survey
07 20252 cloud computing survey
 
06 20273 37746-1-ed
06 20273 37746-1-ed06 20273 37746-1-ed
06 20273 37746-1-ed
 
05 20261 real power loss reduction
05 20261 real power loss reduction05 20261 real power loss reduction
05 20261 real power loss reduction
 
04 20259 real power loss
04 20259 real power loss04 20259 real power loss
04 20259 real power loss
 
03 20270 true power loss reduction
03 20270 true power loss reduction03 20270 true power loss reduction
03 20270 true power loss reduction
 
02 15034 neural network
02 15034 neural network02 15034 neural network
02 15034 neural network
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancement
 
08 17079 ijict
08 17079 ijict08 17079 ijict
08 17079 ijict
 
07 20269 ijict
07 20269 ijict07 20269 ijict
07 20269 ijict
 
06 10154 ijict
06 10154 ijict06 10154 ijict
06 10154 ijict
 
05 20255 ijict
05 20255 ijict05 20255 ijict
05 20255 ijict
 

Kürzlich hochgeladen

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Kürzlich hochgeladen (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

5. 8519 1-pb

  • 1. International Journal of Informatics and Communication Technology (IJ-ICT) Vol. 6, No. 2, August 2017, pp. 105~116 ISSN: 2252-8776, DOI: 10.11591/ijict.v6i2.pp105-116  105 Journal homepage: http://iaesjournal.com/online/index.php/IJICT A Markov Decision Model for Area Coverage in Autonomous Demining Robot Abdelhadi Larach*, Cherki Daoui, Mohamed Baslam Laboratory of Information Processing and Decision Support, Sultan Moulay Slimane University Faculty of sciences and Techniques, Beni-Mellal, Morocco Article Info ABSTRACT Article history: Received Feb 16, 2017 Revised Jun 29, 2017 Accepted Jul 19, 2017 A review of literature shows that there is a variety of works studying coverage path planning in several autonomous robotic applications. In this work, we propose a new approach using Markov Decision Process to plan an optimum path to reach the general goal of exploring an unknown environment containing buried mines. This approach, called Goals to Goals Area Coverage on-line Algorithm, is based on a decomposition of the state space into smaller regions whose states are considered as goals with the same reward value, the reward value is decremented from one region to another according to the desired search mode. The numerical simulations show that our approach is promising for minimizing the necessary cost-energy to cover the entire area. Keywords: Coverage Path Planning Markov Decision Process Robotic Path planning Shortest Path Planning Copyright © 2017 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Abdelhadi Larach, Laboratory of Information Processing and Decision Support, Sultan Moulay Slimane University, Faculty of sciences and Techniques, Beni-Mellal, Morocco. Email: larachabdelhadi@gmail.com 1. INTRODUCTION Shortest Path Planning (SPP) or Coverage Path Planning (CPP) is a task used in a large number of robotic applications, such as demining robots [1], painter robots [2], cleaning robots [3], etc. Several researches concerning SPP and CPP are presented in [4-6], SPP or CPP algorithms are classified in two categories: off-line algorithms, generally used in acknowledged environment and on-line algorithms, used in unrecognized environment. In on-line algorithms the policy is generally updated according to new environment observation. The CPP problem remains subject of research optimization, especially in an unknown environment. In Robotic landmines detection, the agent must detect and find location of all possible mines, avoid all obstacles and follow the shortest path in an unknown environment without overlapping paths. Satisfying such requirements is not always easy and possible. In fact, the whole structure of the environment is not known a priori and the on-line algorithm must be able to seek the optimal strategy according to the knowledge acquired after each observation. E.Galseran [6] presented a survey on CPP in Robotic, several methods were proposed. In this paper, we present a Discounted Markov Decision Model for robotics navigation in grid environment with a theoretical study which permit to propose a new approach for area coverage called Goals to Goals Area Coverage on-line Algorithm based on a decomposition of the state space into smaller regions whose all states are considered as goals and assigned with the same reward value. The reward value can be decreased from one region to another according to the desired search mode such as a line-sweep [13] or spatial cell diffusion [14] approaches.
  • 2.  ISSN: 2252-8776 IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116 106 2. MARKOV DECISION PROCESS Markov Decision Processes are defined as controlled stochastic processes satisfying the Markov property and assigning reward values to state transitions [7, 8]. Formally, they are defined by the five-tuple (S, A, T, P, R), where S is the state space in which the process’s evolution takes place; A is the set of all possible actions which control the state dynamics; T is the set of time steps where decisions need to be made; P denotes the state transition probability function where P(St+1=j|St=i, At=a)=Piaj is the probability of transitioning to a state j when an action a is executed in a state i, St (At) is a variable indicating the state (action) at time t; R provides the reward function defined on state transitions where Ria denotes the reward obtained if the action a is applied in state i. 2.1. Discounted Reward Markov Decision Process Let 𝑃𝜋(𝑆𝑡 = 𝑗, 𝐴 𝑡 = 𝑎| 𝑆0 = 𝑖) be the conditional probability that at time t the system is in state j and the action taken is a, given that the initial state is i and the decision maker is a strategy π; if 𝑅𝑡 denotes the reward at time t, then for any strategy 𝜋 and initial state i, the expectation of 𝑅𝑡 is given by: 𝔼π(Rt, i) =∑ Pπ(St=j,At=a|S0=i)Rjaj∈S a∈A(j) (1) In discounted reward MDP, the value function, which is the expected reward when the process starts with state i and using the policy 𝜋 is defined by: Vπ α (i) = 𝔼[∑ αt 𝔼π(Rt , i)∞ t=0 ], i ∈ S (2) where  ∈ ]0, 1[ is the discount factor. The objective is to determine 𝑉∗ , the maximum expected total discounted reward vector over an infinite horizon. It is well known [7, 8] that 𝑉∗ satisfies the Bellman equation: 𝑉(𝑖) = 𝑚𝑎𝑥 𝑎∈𝐴(𝑖) {𝑅𝑖𝑎 + 𝛼 ∑ 𝑃𝑖𝑎𝑗 𝑉(𝑗)𝑗∈𝑆 } , 𝑖 ∈ 𝑆 (3) Moreover, the actions attaining the maximum in (3) give rise to an optimal pure policy π^* given by: 𝜋∗(𝑖) ∈ 𝑎𝑟𝑔𝑚𝑎𝑥 𝑎∈𝐴(𝑖) {𝑅𝑖𝑎 + 𝛼 ∑ 𝑃𝑖𝑎𝑗 𝑉∗(𝑗)𝑗∈𝑆 }, 𝑖 ∈ 𝑆 (4) 2.2. Gauss-Seidel Value Iteration Algorithm Gauss-Seidel Value Iteration (GSVI) Algorithm is one of the most iterative algorithms used for finding optimal or approximately optimal policies under Discounted MDP [8-10]. In this paragraph, we presente the following optimized pseudo-code ofGSVI Algorithm (Algorithm 1). Algorithm 1. GSVI Algorithm. GSVI (In: S, P, A, R, 𝛤𝑎 + , 𝛼, 𝜀; Out: 𝑉∗ , ∗ ) 1. For all 𝑖 ∈ S Do 𝑉∗ (i)←0; //Initialization 2. Bellman_err ← 2𝜀; //For stopping criterion 3. While (Bellman_err ≥ 𝜀) Do For all i ∈ S Do //Value improvement 𝑉𝑡𝑚𝑝 ← 𝑚𝑎𝑥 𝑎∈𝐴(𝑖) {𝑅𝑖𝑎 + 𝛼 ∑ 𝑃𝑖𝑎𝑗 𝑉∗(𝑗) 𝑗∈𝛤𝑎 +(𝑖) } Bellman_err ← Max( |V* (i) - 𝑉𝑡𝑚𝑝|, Bellman_err) 𝑉∗(𝑖) ← 𝑉𝑡𝑚𝑝 4. For all 𝑖 ∈ 𝑆 Do //Policy calculation ∗(𝑖) ← 𝑎𝑟𝑔𝑚𝑎𝑥 𝑎∈𝐴(𝑖) {𝑅𝑖𝑎 + 𝛼 ∑ 𝑃𝑖𝑎𝑗 𝑉∗( 𝑗) 𝑗∈𝛤𝑎 +(𝑖) } 5. Return 𝑉∗ , 𝜋∗
  • 3. IJ-ICT ISSN: 2252-8776  A Markov Decision Model for Area Coverage in Autonomous Demining Robot (Abdelhadi Larach) 107 We denote by 𝛤𝑎 +(𝑖) = { 𝑗 ∈ 𝑆 : 𝑃𝑖𝑎𝑗 > 0} the successors list of pair (i, a),𝑖 ∈ 𝑆, 𝑎 ∈ 𝐴(𝑖). The algorithm 1 is based on a successors list of pair state-action, which permits to accelerate iteration compared to the classical GSVI Algorithm especially when the number of actions and successors per state is very less than the number of states. Indeed, the complexity of the proposed version is reduced to 𝒪(|𝛤 𝑎 + ||S|) per iteration where |𝛤 𝑎 + | is the average number of state-action successors. 3. MARKOV DECISION MODEL FOR ROBOTIC NAVIGATION To model the Robotic Navigation in general or Demining Robot problem in particular case using a MDP, the five-tuple (S, A, T, P, R) must be defined and the environment representation must be choosen. a. Grid’s Environment: The Grid Based method -ideal for landmines detection- is used to model the environment which is entirely discretized according to a regular grid. The grid size can be chosen according to the robot structure and the field covered by the robot sensor. b. States Space: Using the Grid method, the state space is therefore a set of grids, each grid cell has an associated value stating, obstacle, mine, free or goal state. c. Actions Space: The robot can be controlled through nine actions: the eight compass directions and the action designed by θ that keeps the process in the state where it is; actions that move the robot to an obstacle or a mine state are eliminated; in a goal state the possible action is θ. Figure 1 shows the possible actions in an environment example, the black grid is an obstacle state and the green grid is a goal state. Figure 1. Environment example and possible robot actions in each state Reward Function: A transition to a free or goal state is characterized by a cost of energy proportional to the distance travelled, it is equal to a predefined constant x if the action is diagonal, and 𝑥 √2 if the action is horizontal or vertical, the reward value is therefore equal to -x or −𝑥 √2 ; the reward assigned to the action 𝜃 in a free state is equal to zero and for a goal state it is equal to a predefined constant Rb. Transition Function: The transition function defines the uncertainty due to the effects of actions; it is a data of problem and can be determined by reinforcement learning. Robot Sensor: Several landmines detection methods can be used such as Metal Detector Technologies, Electromagnetic Methods [11], etc. In this work, we suppose that the robot can detect the buried mines existing in near states (Figure 2) by using his owner sensing system, an arm movement or multi-sensors technologie would cover these states.
  • 4.  ISSN: 2252-8776 IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116 108 Figure 2. States covered by the robot sensor Robot Structure: several robot mechanical structures can be used in demining robot such as Omnidirectional or Multi Directional Conductive control [12], etc. Remark 1. Using the proposed model, the time complexity of algorithm 1 is 𝒪(|𝛤 𝑎 + ||S’|) per iteration, where |S’| is the number of free states; for a goal state the expected reward is a constant value equal to 𝑅 𝑏 1−𝛼 (given by (3)) since the only possible action that can be taken in a goal state is 𝜃; |𝛤 𝑎 + | is very less than |S’| and can be considered us constant; so the Algorithm 1 is linear per iteration. 4. THEORETICAL MODEL STUDY In this section, we present a theoretical study for the considered model, which is the basis of our approach. Proposition 1. If the state space do not contains any goal state then for any free state 𝑠0, 𝜋∗(𝑠0) = 𝜃, 𝜋∗ is an optimal strategy. Proof. Suppose ∃ an optimal pure strategy 𝜋∗ and ∃ 𝑠0 ∈ 𝑆 such that: 𝜋∗(𝑠0) ≠ 𝜃; the expected reward when the system start at state 𝑠0 is: 𝑉𝜋∗ 𝛼 (𝑠0) = 𝑉∗(𝑠0) = 𝔼 𝜋∗(𝑅0 , 𝑠0) + 𝔼[∑ 𝛼 𝑡 𝔼 𝜋∗(𝑅𝑡 , 𝑠0)∞ 𝑡=1 ] (5) The expectation of 𝑅𝑡 time t=0 and using strategy 𝜋∗ is given by: 𝔼 𝜋∗(𝑅0 , 𝑠0) = ∑ 𝑃 𝜋∗(𝑆0 = 𝑠0, 𝐴0 = 𝑎|𝑆0 = 𝑠0)𝑅𝑠0 𝑎𝑗∈𝑆 𝑎∈𝐴(𝑗) = 𝑅𝑠0 𝑎 (6) We have: 𝑅𝑠0 𝑎 ≤ −𝑥 √2 then Vπ∗ α (s0) ≤ −x √2 < 0 and the fact that 𝑉𝜋∗ 𝛼 (𝑠0) < 0 implies that 𝜋∗ (𝑠0) cannot be an optimal action since there exists a strategy 𝜋′ : 𝜋′ (𝑠0) = 𝜃 and 𝑉𝜋′ 𝛼 (𝑠0) = 0, which contradicts the supposition. Figure 3 (left) shows a simulation result in an environment example where no goal state is defined; as we can see, ∀ 𝑠0 ∈ 𝑆, 𝜋∗ (𝑠0) = 𝜃. Proposition 2. Let 𝑠0 be a free initial state, if there is no path leading from s0 to the goal state G then 𝜋∗ (𝑠0) = 𝜃. Proof. The proof is similar to the proof of Proposition 1, indeed, for any strategy 𝜋 such that 𝜋(𝑠0) ≠ 𝜃, 𝑉𝜋 𝛼(𝑠0) < 0. Figure 3(right) shows an example of simulation result, as we can see for all 𝑠0 ∈ 𝑆 where there is no path to the goal state G, 𝜋∗ (𝑠0) = 𝜃.
  • 5. IJ-ICT ISSN: 2252-8776  A Markov Decision Model for Area Coverage in Autonomous Demining Robot (Abdelhadi Larach) 109 Figure 3. Optimal strategies examples (no goal state (left) and no path to the goal state for some states (right)) Proposition 3. Let G be a goal state with reward value Rb=0, then for any initial free state 𝑠0, 𝜋∗(𝑠0) = 𝜃. Proof. It is clear since for any strategy 𝜋 such that 𝜋(𝑠0) ≠ 𝜃, 𝑉𝜋 𝛼(𝑠0) < 0. Figure 4 shows a simulation example in an environment where there exist four goals states with reward value Rb=0, as it can be seenπ∗(s0) = θ for all free initial state. Figure 4. Optimal strategy in an environment example with four goals states within reward value Rb=0 Proposition 4. Let 𝑠0 be a free initial state, suppose that there exist a path of length l, from 𝑠0 to the gaol state G and let 𝑅 𝑏 be the reward value obtained when the action 𝜃 is applied in a goal state G. 𝐼𝑓 𝑅 𝑏 > ( 𝑥 𝛼 𝑙 − 𝑥) 𝑡ℎ𝑒𝑛 𝜋∗(𝑠0) ≠ 𝜃 (7) Proof. Let 𝑉∗ (𝑠0) be the expected reward when the process start at state 𝑠0. 𝑉∗ (𝑠0) = ∑ 𝛼 𝑡 𝑅𝑡 ∞ 𝑡=0 = ∑ 𝛼 𝑡 𝑅𝑓 𝑡𝑙−1 𝑡=0 + ∑ 𝛼 𝑡 𝑅 𝑏 ∞ 𝑡=𝑙 (8) where: 𝑅 𝑓 𝑡 = 𝐸 𝜋∗( 𝑅𝑡 , 𝑠0) = ∑ 𝑃 𝜋∗(𝑆𝑡 = 𝑗, 𝐴𝑡 = 𝑎|𝑆0 = 𝑠0)𝑅𝑗𝑎𝑗∈𝑆 𝑎∈𝐴(𝑗) is the expectation of the reward value obtained when some compass action is taken in the state St at time At. The equation (8) can be modified as follow: 𝑉∗ (𝑠0) = ∑ 𝛼 𝑡 𝑅𝑓 𝑡𝑙−1 𝑡=0 + 𝛼 𝑙 ∑ 𝛼 𝑡 𝑅 𝑏 ∞ 𝑡=0 (9) Let V∗ (sg) be the expected reward when the process starts at goal state G, by using (3), we have: 𝑉∗ (𝑠𝑔) = ∑ 𝛼 𝑡 𝑅 𝑏 ∞ 𝑡=0 = 𝑅 𝑏 1−𝛼 (10) Using (10), the equality (9) can be reformed as follow: 𝑉∗ (𝑠0) = ∑ 𝛼 𝑡 𝑅𝑓 𝑡𝑙−1 𝑡=0 + 𝑅 𝑏×𝛼 𝑙 1−𝛼 (11) The fact that: 𝑅𝑗𝑎 ≥ −𝑥 implies that 𝑅𝑓 𝑡 ≥ −𝑥, then:
  • 6.  ISSN: 2252-8776 IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116 110 ∑ 𝛼 𝑡 𝑅𝑓 𝑡𝑙−1 𝑡=0 ≥ −𝑥 ∑ 𝛼 𝑡𝑙−1 𝑡=0 (12) By applying (12) in (11), we have the resulting equation: 𝑉∗ (𝑠0) ≥ 𝑅 𝑏×𝛼 𝑙 1−𝛼 − 𝑥(∑ 𝛼 𝑡𝑙−1 𝑡=0 ) (13) The fact that ∑ 𝛼 𝑡𝑙−1 𝑡=0 = 1−𝛼 𝑙 1−𝛼 , equation (13) becomes: 𝑉∗ (𝑠0) ≥ 𝑅 𝑏×𝛼 𝑙−𝑥+𝑥×𝛼 𝑙 1−𝛼 (14) The fact that Rb > ( x α𝐥 − x) implies that Rb × αl − x + x × αl > 0. Equations (14) imply that V∗ (s0) is great than zero. 𝑉∗ (𝑠0) ≥ 𝑅 𝑏×𝛼 𝑙−𝑥+𝑥×𝛼 𝑙 1−𝛼 > 0 (15) The fact that 𝑉∗ (𝑠0) > 0 implies that 𝜋∗ (𝑠0) ≠ 𝜃. Figure 5 shows a strategy example generated using algorithm 1 with parameters: Rb=200, =0.2 and 𝑥 = 1; for a path length l=3, 𝒍3 = 1 𝛼3 − 1 = 125 < 𝑅 𝑏 = 200; and for l=4, 𝒍4 = 1 𝛼4 − 1 = 624 > 𝑅 𝑏 = 200, we can conclude from the values of 𝒍3 and 𝒍4 that: If l < 4 then 𝜋∗ (𝑠0) ≠ 𝜃. Figure 5. An optimal strategy example using parameters: α=0.2, x=1 and Rb=200 Proposition 5. Let G be a goal state with reward value 𝑅 𝑏 = 𝑥 𝛼|𝑆|. For any initial free state 𝑠0, if there exists a path to the goal state G then 𝜋∗ (𝑠0) ≠ 𝜃. Proof. Let l be the path length to the goal state G; using proposition 4, if 𝑅 𝑏 > ( 𝑥 𝛼 𝒍 − 𝑥) then 𝜋∗ (𝑠0) ≠ 𝜃. Let show that 𝑥 𝛼|𝑆| > ( 𝑥 𝛼 𝒍 − 𝑥), we have |S| > 𝑙 then 𝛼|𝑆| < 𝛼 𝑙 and 𝑥 𝛼|𝑆| > 𝑥 𝛼 𝒍 > 𝑥 𝛼 𝑙 − 𝑥, since 𝑥 > 0. Figure 6 shows a path strategy example with parameters: α=0.2, x=α|S| and Rb=1 (note that if (x=α|S| ) then Rb=1); as it can be seen, 𝜋∗ (𝑠0) ≠ 𝜃 for all free initial state 𝑠0.
  • 7. IJ-ICT ISSN: 2252-8776  A Markov Decision Model for Area Coverage in Autonomous Demining Robot (Abdelhadi Larach) 111 Figure 6. An optimal strategy example using parameters: α=0.2, x=α|S| and Rb=1 Proposition 6. Let G1 (G2) be a goal state such that 𝑅 𝑏 1 = 1 (Rb 2 = 1 + 1 α|S|). Suppose ∃ a path of length 𝑙1 (𝑙2) from initial state 𝑠0 to the goal state G1 (G2), then the optimal strategy navigates robot to the goal state G2 for all 𝑙1 ≥ 1. Proof. Suppose ∃ an optimal strategy 𝜋1 ∗ that moves the robot to the goal state G1. Equation (11) implies that: 𝑉𝜋1 ∗ 𝛼 (𝑠0) < 𝛼𝑅 𝑏 1 1−𝛼 = 𝛼 1−𝛼 < 1 1−𝛼 (16) Let 𝜋2 be a strategy that moves robot to the goal state G2, equation (15) and the fact that 𝑙2 < | 𝑆| imply that: 𝑉𝜋2 𝛼 (𝑠0) ≥ 𝑅 𝑏 2 ×𝛼 𝑙−𝑥+𝑥𝛼 𝑙 1−𝛼 > 𝑅 𝑏 2 ×𝛼|𝑆|−𝑥+𝑥𝛼|𝑆| 1−𝛼 (17) By using the fact that 𝑅 𝑏 2 = 1 + 1 𝛼|𝑆| and 𝑥 = 𝛼|𝑆| , inequation (17) can be rewritten as: Vπ2 α (s0) > (1+ 1 α|S| )α|S|−x+xα|S| 1−α = 1+α2|S| 1−α > 1 1−α (18) Inequalities (16) and (18) imply that: 𝑉𝜋2 𝛼 (𝑠0) > 1+𝛼2|𝑆| 1−𝛼 > 1 1−𝛼 > 𝑉𝜋1 ∗ 𝛼 (𝑠0) (19) The fact that 𝑉𝜋2 𝛼 (𝑠0) > 𝑉𝜋1 ∗ 𝛼 (𝑠0) implies that π1 ∗ cannot be an optimal strategy. 5. GOALS TO GOALS AREA COVERAGE ALGORITHMS In this section, we present the proposed on-line algorithm for area coverage in autonomous demining robot based on the model defined in section 3, and the theoretical studies presented in the previous section. We begin by the first version (Algorithm 2) called Goals to Goals Randomize Area Coverage, in which we suppose that all states are goals with the same reward value, except then initial state. Beginning by the initial position, the robot detect the near states, update detected states, calculate path strategy using algorithm 1 and move according to an optimal action; these steps are repeated until the optimal action is equal to θ. Algorithm 2. Goals to Goals Randomize Area Coverage. Data: S, P, A, R, 𝛤𝑎 + , 𝛼, 𝜀; 𝑥 = 𝛼|𝑆| , s0: initial state 1. Set all states as goals (except initial state) with the same reward: 𝑅 𝑏 = 1. 2. Repeat 2.1. Observe near states with sensor 2.2. Update observed states 2.3. Calculate strategy using algorithm 1 2.4. Move robot using an optimal action Until (the optimal action is equal to 𝜃)
  • 8.  ISSN: 2252-8776 IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116 112 Theorem. The algorithm 2 works correctly and terminates after covering the entire area except the non- reachable regions from the start state s0. Proof. The proof follows from the propositions 1, 2, 4 and 5. In fact, propositions 1 and 2 imply that if there is at least one state not explored and reachable from the current robot position, the optimal action is different to 𝜃. Moreover, after each observation, the status of near states (supposed goals) is updated; the navigation to the nearest goal is assured by propositions 3 and 4 and the robot stops after exploring the entire environment except the unreachable states from the start state s0 (proposition 2). Figure 7 shows a path generated using algorithm 2 in an obstacle free environment (ten randomized mines positions), as we can see, the entire area is covered, but the search path is pseudo-random and the robot overlaps some regions previously detected. Figure 7. Environment example and path generated using algorithm 2 To minimize the overlapping paths, the authors propose a second version (algorithm 3) based on decreasing the rewards values according some Search Mode such as a line-sweep (Figure 8) based approach described in [13] or spatial cell diffusion approach presented in [14]. In each smaller region (for example in figure 8, a smaller region contain nine states) all states are considered as goals with the same fixed reward value and the search mode is assured by the proposition 6. Figure 8. Example of rewards values decremented according the line-sweep search mode
  • 9. IJ-ICT ISSN: 2252-8776  A Markov Decision Model for Area Coverage in Autonomous Demining Robot (Abdelhadi Larach) 113 Algorithm 3. Goals to Goals Search mode Area Coverage Data: S, P, A, R, 𝛤𝑎 + , 𝛼, 𝜀; 𝑥 = 𝛼|𝑆| , s0: initial state 1. Decompose the state space into the p smaller regions. 2. Define the decreasing reward values according to the desired search mode. 3. For each smaller region i = p,….,1 Do Set all states in region i as goals with 𝑹𝒊 = 𝟏 + 𝒊 𝛼|𝑆|. 4. Repeat 4.1. Observe near states 4.2. Update observed states 4.3. Calculate strategy using algorithm 1 4.4. Move robot using an optimal action Until (the optimal action is equal to 𝜃) 6. SIMULATION RESULTS The proposed algorithms are simulated using JAVA language implementation; figures 9, 10 and 11 show the simulation results for several search modes, in an obstacle free environment example, with ten randomized buried mines. It can be seen from figures 9, 10 and 11 that the Algorithm 3 works correctly with low repetition rate compared to Algorithm 2. In some case, it is optimal and beneficial that the robot finishes the area coverage near its initial position; the variant of Line Sweep search mode (Figure 11) can be used. To evaluate these search modes, we can use the following valuation function: 𝐸 = ∑ 𝑅 𝑠 𝑡 𝑥 𝑇 𝑖=0 = ∑ 𝐶𝑠 𝑡 𝑇 𝑖=0 (20) where 𝐶 𝑠 𝑡 = 1 √2 𝑜𝑟 1 and T is the number of decisions to complete the area coverage; this valuation function is proportional to the time required for the entire exploration; the number of rotation can be added to this valuation function. Table 1. Valuation Functions: Comparison of Several Search Modes Search mode Ē Randomize 100 Spatial Cell Diffusion 83 Line Sweep 80 Variant of Line Sweep 82 Table 1 presents the average value of E for each search mode and for several irregular buried mines locations generated randomly in the same environment example, us it can be seen, these search modes in free obstacle environment achieves the area coverage with low cost of energy, especially the Line Sweep search mode which is beneficial in the decomposition method for real environment with obstacles. Figure 9. Online strategy generated using algorithm 3 for spatial cell diffusion search mode
  • 10.  ISSN: 2252-8776 IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116 114 Figure 10. Online strategy generated using algorithm 3 for Line-Sweep search mode Figure 11. Online strategy generated using algorithm 3 for a variant of Line-Sweep search mode Remark 2. Area coverage without overlapping path is not always easy and possible especialy in an unknown environment. Figure 12 shows a strategy example with some overlapping paths. Figure 12. Online strategy example with some overlapping paths Remark 3. For a large state space and real environment with obstacles, line sweep decomposition into monotone subregions can be used [13]; in each subregion, an adequate line-sweep search mode is used to ensure optimal area coverage. The robot explores subregions one by one; each subregion covered can be changed as goals states with reward 𝑅 𝑏 = 0 , so that there will be no return to the explored region (Proposition 3) and so, the decision time of Algorithm 1 can be reduced to real time since it is proportional to the number of free states (Remark 1).
  • 11. IJ-ICT ISSN: 2252-8776  A Markov Decision Model for Area Coverage in Autonomous Demining Robot (Abdelhadi Larach) 115 Figure 13 shows an environment example with obstacles divided into two subregions, after exploring the left first region, the states are transformed to goals with reward value equal to zero; the path generated is shown in Figure 14. Figure 13. Environment example with obstacle (blue region) divided into two subregions Figure 14. Online strategy generated for the environment example in figure 13 Remark 4. For minimizing the memory consumption in large state space, each subregion can be aggregated to one state after and before exploration, only the active subregion is disaggregated. 7. CONCLUSIONS In this work, we have presented a Discounted Markov Decision Model for robotic navigation in grid’s environment by adding a fictitious action in each state witch is a key of the simplest proposed algorithms, the use of this model is certainly not limited to autonomous robot landmines detection but it can be used in several robotics applications. We have also presented a theoretical study of Discounted MDP in the proposed model to ensure an optimal path strategy. This theoretical study permit us to prove the correctness of the proposed new approach to find a best on-line path strategy for detecting buried mines in a free obstacle environment where hidden mines are randomly distributed in different positions. The simulation results show that our approach is encouraging and promising for an extension, in future work, to different types of environment and to the Partially Observable MDP where the robot position is stochastic.
  • 12.  ISSN: 2252-8776 IJ-ICT Vol. 6, No. 2, August 2017: 105 – 116 116 REFERENCES [1] H. Najjaran and N. Kircanski, “Path Planning for a Terrain Scanner Robot”, in Proc. 31st Int. Symp. Robotics, Montereal, QC, Canada 2000, pp. 132-137. [2] P. Atkar et al, “Uniform coverage of automotive surface patches”, The International Journal of Robotics Research, 24(11), pp.883-898, 2005. [3] J. S. Oh et al, “Complete Coverage Navigation of Cleaning Robots using Triangular-Cell-Based Map”, IEEE Transactions on Industrial Electronics, vol. 51, no. 3, pp. 718-726, 2004. [4] P. Tokekar; N. Karnad; V. Isler, “Energy-Optimal Trajectory Planning for Car-Like Robots,” Autonomous Robots, vol. 37, no.3, pp.279-300, 2014. [5] J. W. Kang; S. J. Kim; M. J. Chung, “Path Planning for Complete and Efficient Coverage Operation of Mobile Robots”, IEEE International Conference on Mechatronics and Automation, pp. 2126-2131, 2007. [6] E. Galceran; M. Carreras, “A Survey on Coverage Path Planning for Robotics”, Robotics and Autonomous Systems, 61, pp. 1258-1276, 2013. [7] R. E. Bellman. “Dynamic Programming. Princeton University Press”, Princeton, NJ, 1957. [8] M. Puterman, “Markov Decision Processes: Discrete Stochastic Dynamic Programming”, John Wiley & Sons, Inc., New York, USA, 1994. [9] D. J. White, “Markov Decision Processes” John Wiley & Sons, Inc, New York, 1994. [10] D. P. Bertsekas, “Dynamic Programming and Optimal Control,” Belmont: Athena Scientific, 2001. [11] C. P. Goonerante; S. C. Mukhopahyay; G. Sen Gupta, “A Review of Sensing Technologies for Landmine Detection: Unmanned Vehicle Based Approach”, in: Proc. 2nd International Conference on Autonomous Robots and Agents December 13-15 2004 Palmerston North, New Zealand. [12] K. Suresh; K. Vidyasagar; A. F. Basha, “Multi Directional Conductive Metal Detection Robot Control”, International Journal of Computer Applications, volume 109 no.4, 2015. [13] W. H. Huang, “Optimal Line-Sweep-Based Decompositions for Coverage Algorithms”, in Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE International Conference on, vol. 1. IEEE, 2001, pp. 27–32. [14] S. W. Ryu et al, “A Search and Coverage Algorithm for Mobile Robot”, in Ubiquitous Robots and Ambient Intelligence (URAI), 2011 8th International Conference on. IEEE, pp. 815–821, 2011.