MS Project

Darin Hitchings
8/18/02
Vehicle Routing Project
Overview:
The purpose of this vehicle routing project is to explore sub-optimal solutions to the difficult
problem of object classification with noisy sensors on a 2D grid with locality constraints. The problem
at hand is to find the best policy for planning vehicle movements to explore the graph in order to
minimize a cost-function. An exact Dynamic Programming solution for such a problem is highly
infeasible because the complexity of the DP algorithm grows exponentially with the number of cells in
the grid, which is proportional to the number of states in the problem. Instead, this project formulates
the problem as a multi-commodity network flow problem with each vehicle as a commodity and tasks
assigned at each grid square. The vehicles must flow through the graph to collect value and are
constrained to move in 4 directions (the 4-neighbors of each cell) as well as constrained by the
boundaries of the grid. A simulation was created to compare the benefit of this algorithm with a
standard myopic policy for vehicle movement.
Terms and Definitions:
Throughout the entirety of this paper, the words “cell”, “node” and “coordinate” are all defined
to mean a grid-square within the graph where a task can be performed. The simulation uses a regular,
square grid and hence there will be G^2 possible tasks at any one time for one vehicle on the graph
where G is defined to be the width of the grid (width = height). In addition, the words “vehicle”,
“sensor platform” or just “platform” are used interchangeably. Each side of each cell has an associated
directed arc which represents the flow across that boundary of the cell of a given sensor platform at a

given time. These flows come in pairs: there exists a corresponding flow entering every cell for each
direction in which there is a flow leaving a cell. Note that corner cells have 2 pairs of flows crossing
their two interior faces, cells on the side of the grid have 3 pairs of flows associated with their interior
faces and cells within the interior of the grid have 4 pairs of flows crossing their exposed sides. The
convention within this project is that a positive flow at a cell is directed outwards from that cell.
Lastly, the coordinate system defines the cell at the bottom-left corner of the grid as (0,0) and the cell
at the top-right corner of the grid as (G-1,G-1). A task for this problem consists of an assignment for a
given vehicle to make a measurement at a particular cell at a particular time. Tasks at cells are denoted
X(m,k,t) where m is the cell index, k is the number of the vehicle doing the task and t is the time at
which the task is done. “Arcs” which are the vehicle flows are denoted Y(m,n,k,t). This notation
specifies that platform k is flowing from cell index m to cell index n at time t. All vehicle flows are
positive by convention. Bounds on k and t are: 0 ≤ k < K and 0 ≤ t < T, where K is the number of
platforms in the simulation and T is the number of times in which the platforms can move. G, K, T є
Ζ1
. In this project T gives the initial length of the planning horizon. T needs to be an even number or
else it isn’t possible for the platforms to end up back at base at the end of the simulation. As time
progresses the planning horizon shrinks because there are fewer and fewer moves to go before time is
up. A constraint is placed on all platforms that says they must return to base by the end of the planning
horizon. If a cell is referred to by index instead of coordinate, the indices are taken to start at (0,0) and
increase along the rows in the direction of +x. Coordinate (0,1) is at index 10, coordinate (9,9) is at
index 99 when G=10.
Variables:
1) x(m,k,t) : the fractional amount of task m (task m = node m = cell m) which is done at time t by sensor k. This value
is a real number. (sensor k = platform k = vehicle k)

2) y(m,n,k,t) : fractional flow of sensor k from node m to node n at time t, is 0 for n not in F*(m) which is to say for
cells that are not neighbors of m. Fractional flow values are real numbers.
3) F*(m) : the 4 neighboring cells of node m
4) m0 : the base cell where all of the platforms start at and return to, typically at the center of the grid
5) Ns
(m,t) : number of sensors entering the system, non-zero only when t=0 and m=m0, (for the initial LP problem)
6) Ne
(m,t) : number of sensors leaving the system, non-zero only when t=T and m= m0
7) v(m) : value of sensing task at node m = entropy of node m
Algorithm:
The solution method used in this project uses an approximation to the optimal cost-to-go
function, which would be computed from the D.P. algorithm. The basic equation of dynamic
programming is the Bellman Equation, which is a recursive equation backwards in time. This equation
states:
( )[ ])1()),(())((* min)),((
++=
∈
txJtxgtxJ EttxUu
ω
ω
where ( ))(),(),()1( ttutxftx ω=+ .
This equation basically says that if one starts at the end of the time horizon one can work one’s way
backwards to the beginning until the current time is reached and build a (very big) table of optimal
moves for every state that can occur through the future course of the simulation. In the Bellman
Equation, J*() is the optimal cost-to-go, x is the state, U(x(t)) is the set of possible actions that is
available at state x at time t, and g is the local cost of being in the current state. This equation does not
lend itself towards an easy solution when the length of the time horizon or the number of possible
states or both grows large, so even though the solution is optimal, for real-time decision making other
suboptimal methods are required.
Several relevant papers1,2
indicate that using the rollout-algorithm (described below) to
approximate the cost-to-go, J(x(t)), yields much of the performance given by the optimal policy from

D.P. With this idea in mind, this project breaks the solution procedure into two steps. First there is a
look-ahead step in which every possible move is considered up to a certain depth in the future. After
this given depth, the program then uses a rollout step to approximate the cost-to-go farther ahead in
time than the look-ahead can see. Given the sum of the exact local costs of every possible future state
for the next several moves in the look-ahead region plus an approximate cost-to-go from the rollout
region, an approximation to the actual cost-to-go is computed. The aforementioned “regions” in the
solution procedure are regions in time. The costs-to-go values are used to decide which of the possible
next moves is best: by choosing moves that minimize the cost-to-go or maximize the reward-to-go,
one is assured that all vehicles follow an optimal path. The state of the system is comprised of the
position of the sensor platforms and the probability vector at each cell, so minimizing the cost-to-go
from a given state ensures each platform is positioned optimally at all future times. Optimality is
ensured through penalizing bad future vehicle positions (bad states) by assigning extra cost to them
with the objective function. The current version of the simulation uses a look-ahead window of one
move.
The idea behind the rollout algorithm is that a base policy is fixed and then successively
evaluated in time down to the end of the planning horizon to get an approximation of the optimal cost-
to-go that the base policy would give were it actually used at every decision step. Base policies are
heuristical in nature because again optimal policies are intractable. By rolling out these heuristics to
the end of the planning horizon, a program can greatly increase the performance of the base heuristic
by predicting how it will do over time. Using a heuristic in the rollout algorithm helps to eliminate the
explosive complexity of looking forward in time. The rollout step for this project is performed by
making a Linear Programming approximation, which relaxes the constraints that specify every vehicle,
must be in one place at one time. Instead the relaxed constraints decree on average vehicles must be in

one place at one time. This approximation converts the rollout step into a linear problem, which can be
solved efficiently using the Simplex Method of Linear Programming. The algorithm is linear in the
number of possible moves at a given state, times the cost to compute an approximate cost-to-go, which
is of polynomial complexity given an LP approximation.
A simplex is a higher-dimensional version of a triangle. A simplex in 2D is a triangle, in 3D is
a tetrahedron and so forth. The simplex method works by checking boundary points of a region
enclosed by a system of linear inequalities. As seen from Fig 1, the optimal solution must reside at one
of the red corner points (vertices) for any objective function. If one of the vertices is not optimal then
by following the edge of one of the constraints that comprises that vertex, one can find a new vertex
with a better objective function value until there are no better vertices. The simplex method can be
thought of as an ameba that flows down through the valleys of an N-dimensional surface checking one
corner point after another and reflecting off the walls of the surface when it hits one. Eventually it will
always find the optimal vertex of the simplex formed by the system of inequalities, which gives the
optimal values of the N variables according to the objective function.
Figure 1

For this project the open-source program “lp_solve 3.0” was used as the solver for implementing the
Simplex Method, it is suitable for solving systems of several hundred variables and constraints.
In order to evaluate which moves are better than others in the look-ahead region a value-metric
was required. Several different choices were available but the standard Shannon Information / Entropy
criterion works well. Minimizing entropy was also the objective of the base policy in the rollout
algorithm. The entropy of the probability distribution at a cell m is defined to be
( )∑=
−=
C
0i
i)p(m,logi)p(m,E(m) where C is the number of different class / object types possible.
In this project C=2. Objects of class type 0 are labeled as hostile and objects of type 1 are labeled as
friendly. Equivalently, one could say class 0 objects are “interesting” and type 1 objects are not. By
minimizing the entropy of all cells the vehicles gain information about the probability distribution of
each cell and act to minimize the number of False Alarms and Miss Detections that occur when it is
time to decide which cells are friendly and which are not at the end of the simulation. The closer an
entropy becomes to 0, the more a probability density function looks like a Dirac delta function and the
more skewed the distribution is in favor of being one type of object or another.
Definition of the LP Problem:
In order to create an LP problem, a set of constraint equations needed to be specified so that the
Simplex Method would be guaranteed to only consider moves which are possible for the platforms to
make (on average). For a grid of size GxG, there will be G^2 possible tasks at each time for the G^2
cells in the grid. Every time a task is completed a sensor measurement was taken at the cell associated
with the task and information is gleaned which reduces the probability of classification error when
objects are classified. The problem is posed as follows:

Problem: max ∑tk,i,
t)k,x(i,v(i)
Subject to:
I. mnodes1t)k,x(m,
tk,
∀≤∑
II. t > 0 kt,m,t)(m,N-t)(m,N1)tk,,ny(m,t)k,m,,ny( se
)(Fn)(Fn
∀=+′−′ ∑∑ ∗∗
∈′∈′ mm
t = 0 1k,0),n,y(m
)m(Fn
0
0
=′∑∗
∈′
for m0
0
)m(Fn
mm0k,0),ny(m, ≠∀=′∑∗
∈′
III. ∑∗
∈′
∀′≤
)m(Fn
tk,m,t)k,m,,ny(t)k,x(m,
Constraint Equation I is just a set of bounds on every task that makes sure the LP problem only
tries to complete each task once. Equation II gives the important conservation of flow constraints for
each platform and specifies that only as many pieces of a platform as leave a cell can enter a cell. The
constraint is defined piece-meal because at time 0 there is a discontinuity where all of the platforms are
entering the system (the grid) for the first time. Thus there is a source of platforms at the base cell at
time 0 and all other cells generate no vehicles at any other time. The base cell does not source any
platforms after time 0 and conversely it doesn’t sink any other platforms except at time t=T. No other
cells sink sensor platforms at any time. The discontinuity of having a sink of platforms at the last time
t=T is a non-issue because no planning is done for any time after time T, and for time T-1 the planning
is trivial: all platforms should be one move away from home and must make the appropriate move to
return to base.
Two sets of bounds were added in the initialization of the LP problem for lp_solve. These
bounds are treated separately from how constraint equations are handled by lp_solve. The bounds are

a little redundant when the conservation of flow equations are in place, but they can be specified at
little extra computational cost.
x(m,k,t) ≤ 1, y(m,n,k,t) ≤ 1 tk,n,m,∀
Simulation:
The simulation for this project was created as a C++ program that interfaces with the C code
used in the lp_solve program. The simulator is a console application which is designed for large
numbers of Monte Carlo simulations, so all output is dumped to a log file and to a file which stores
simulation results. The program takes two command line arguments: a) the file name that contains all
of the simulation parameters, b) a file name to log error messages to. The convention used was to
name input files “input_G3_K2_T4.dat” for example and output files “output.dat.” The input file
contains parameters for G, K and T as well as FA / MD numbers, the number of Monte Carlo
simulations to run and statistics for the accuracy of the sensors. The order of the parameters in the
input file is documented with comments at the bottom of the input files (also see Appendix D.)
In the simulator the actual entropy-reward that is gained from visiting a cell is a random
variable and so the information that is gained from visiting a cell is not known in advance. Therefore a
random measurement is used to simulate the accuracy of the sensor reading that happens when a
platform passes by a cell; there are good measurements and there are bad measurements. Sometimes
measurements will lead the object classification algorithm (a Likelihood-Ratio Test) astray, however if
the sensors have any utility, then on average information is gained from visiting cells. Since the actual
reward is stochastic in nature, the simulator assigns expected rewards to visiting and taking a
measurement of each cell. By convention, the terminology “first-level entropy” is used to refer to the
value of visiting a cell once and the term “second-level entropy” is used for the value of visiting the
cell a second time. Good measurements cause these numbers to decrease monotonically.

The tricky part about the simulation was making the LP problem solved for each possible
platform move at time t+1 mesh with the problems that were being solved for all the possible moves at
time t. If the simulation does not initialize the LP problem at time t+1 in a manner consistent with the
intended state trajectory at time t, then the program would become schizophrenic and move
unpredictably.
Great length was taken in organizing the constraints’ rows and variables’ columns in the
definition of the LP problem for lp_solve so that its sparse matrix representation would be
conveniently organized in memory. Memory was allocated with the variables arranged backwards in
time so that after each time update the LP problem could be down-sized to correspond to the new
(smaller) planning-horizon, all data corresponding to early simulation steps is therefore located last in
memory. It was a simple matter to eliminate the last columns of the LP problem corresponding to the
previous simulation step, however in pairing down the LP problem after each time update, taking away
unused constraints means memory must be moved around because lp_solve stores variables in column
order and the constraints are represented as rows of non-zero coefficients in each column. With this
arrangement deleting columns is fast but deleting rows is slow and requires shuffling data.
So in the long run it actually proved faster not to resize the LP problem each round and instead
to use the LP variables’ bounds to keep the planning process synchronized with the simulation’s state
from one turn to the next. If the planning process dictated that a platform should move in a particular
direction then the lower bound on the flow corresponding to that platform at that time was set to 1.0.
These bounds acted as constraints that informed the LP solver that for all moves prior to the current
time there is no flexibility in assigning vehicle flows. This way the LP problem does not try to re-plan
for moves that have already happened. In addition, after each simulation step, all of the task variables
for the previous time step are zeroed out. This allows the LP problem to take a second look at cells

where it has been before without running into the constraint that says a tasks value can only be
collected once. The constraints specified by Equation I are meant to disallow a first-level entropy at a
cell from being collected twice. By zeroing out previous tasks and resetting the value of revisiting that
cell in the objective function to a new entropy value, the simulator can treat second-level entropies for
cells that have been visited as if they are first-level entropies. These modifications to the LP problem
after each time step act to enforce the desirable condition that tasks can only be completed at times
after the current time when the LP problem is solved once for every possible platform move during
each vehicle’s path-planning process. See Appendix A for more information on how the programs
main loop progresses.
Figure 2 displays a graphical representation of the course of one simulation when the simulator
is running in Debug mode (_DEBUG is defined). In this mode the simulator sends debug output to
several files (see Appendix C) and does not randomize the order in which the directions are checked
when the look-ahead routine is checking the possible moves of each vehicle. This setup makes tracing
through the code much easier but means that this example shows a bias because platforms have a
tendency for moving in the order the directions are enumerated in: N, E, S, W. (When entropy values
are compared, platforms will only move in a direction which happens later in this enumeration if the
entropy value is greater than the previous direction’s entropy when the program is in Debug mode.)
Appendix D contains a listing of all cost-to-go’s, optimal or not, along with a chart for how the
entropy coefficients of the LP problem change over time.
The simulator was run on a 1.2 Ghz Athlon machine running WinXP. The solver lp_solve was
able to handle 4x4 grids with 2-4 platforms and T = 4 however it sometimes complained of numerical
instabilities. A grid of size 4x4 is also problematic because it has no center cell. Therefore for the
purposes of creating this report problem sizes with 3x3 grids were used.

Figure 2
Problem sizes of G=4 and larger caused the lp_solve function solve() to terminate with a return value
of other than OPTIMAL which indicates that after several million iterations the solver still hasn’t
succeeded in finding the solution yet.
As far as the scalability of the simulator code goes, the computer the simulator was run on
could run 1000 Monte Carlo Simulations with the system G=3, K=2, T=4 in under a minute. For the
system G=4, K=2, T=4 each simulation takes around 1 sec. For a 5x5 system with K=2 and T=8 each
LP problem used to study a potential move took about 4 sec to solve.
Unfortunately, lp_solve 3.0 was unable to solve simulations with most of the parameter sets
that were intended to be analyzed. It could not run 1000 simulations for G=3, K=2 and T=6 without
failing to solve one of the LP problems of a potential move. The algorithm would solve most of the

LP’s with no complaints but every once and a while complain about numerical instability. Eventually
it would return a result of “no optimal solution found” at which point the solver was considered to have
crashed. This same behavior was observed for parameter sets such as (G=4, K=2, T=8) and (G=5,
K=1, T=8), only the solver crashed much faster.
In order to demonstrate that it was not the simulator code which had an instability, a trial
license for the commercial product “MOSEK Optimization Toolkit for Matlab v. 5” was obtained.
When the lp_solve problem failed to solve an LP problem that was given to it, the LP problem in
question was saved to disk in MPS format. Then MOSEK was used to read in the MPS file and solve
it, taking about 2.5 sec. This commercial program did not have any difficulty solving the LP problem
and specified that its result was the optimal solution. Therefore lp_solve is definitely not a program
that is suited for use with large simulations in this project and a license for a better solver needs to be
obtained. Unfortunately, the simulator code was designed for use with the lp_solve functional
interface in mind, not for the MOSEK toolkit for Matlab or the interface for the CPLEX solver. Even
if access to these commercial products was available, it would require rewriting a significant portion of
the simulator code as well as perhaps porting the project to the unix operating system.
A formula for the number of variables in the simulation is (8 + 12*(G-2) + 4*(G-
2)^2)*K*(T+1) + G^2*K*(T+1). So for (G=3, K=2, T=4) there are 330 variables in the LP problem.
For (G=4, K=2, T=4) there are 640. A system with (G=5, K=4, T=8) gives 3780 variables. Lastly, the
original system envisioned for simulation: (G=10, K=4, T=50) has 93,840 unknowns. None of these
larger systems could be analyzed with the lp_solve routine. The MOSEK toolkit was able to solve 5x5
grids without trouble although there was a lot of manual labor required in preparing an LP problem for
its use. From the documentation, the MOSEK program would no doubt have been able to solve the
system (G=10, K=4, T=50) however it had trouble reading in the MPS file. Since it could read in the

smaller MPS files okay, it was concluded that it is either not made to read in 18 MB MPS files or else
there was a limitation placed on reading in large files for the version of the program used.
Analysis:
This section will discuss the simulation results that were obtained in comparing the look-
ahead+rollout policy versus a simple myopic policy. The basic instrument for comparing the
performance of each strategy is the Receiver Operating Characteristic graph. This graph plots the
probability of a successful Detection of a target versus the probability of a False-Alarm for a given
policy. Minimizing one will necessarily cause an increase in the other. The closer the ROC curve of a
policy is to the left hand and top side of the graph the better the policy is. Each data point on these
graphs was obtained by running 1000 Monte Carlo Simulations and averaging the results together.
The following sets of parameters were chosen for comparison and were subject to the limitations of the
computing platform and the LP solver used: (G=3, K=1, T=4), (G=3, K=2, T=4), (G=3, K=4, T=4).
As mentioned previously, other systems that used a larger size grid or more simulation time-steps had
numerical instability problems and could not successfully run 1000 times without failing to solve an
LP problem. This limitation has another important effect on the course of each simulation. The
current version of the simulator is only built to deal with one level of lookahead and first-level
entropies. On a grid as small as 3x3, there are only 8 cells which have non-zero entropy values and are
thus worth visiting. (The base/home cell has no value for exploring.) Therefore for any system in
which K*T > G2
-1, there will be sensor platforms resting around idly during the path-planning process
because they can not see the second-level entropies and so they think there is nothing to do. At any
given time-step the current simulator will only let a task at a cell be done once and if there are enough
time steps for some of the vehicles to cover all of the cells, the remaining vehicles will not use their

time productively. The constraints still force the vehicles to move however and so they can still take
measurements at cells they visit (provided no one else is there), but these under-utilized vehicles move
stupidly. This problem provides one more reason why a better solver is needed or else the simulation
needs to be revamped to deal with multi-level look-aheads. A two-level look-ahead would more than
double the number of variables in the simulation though so there is no recourse for having a
professional solver as the back-end of this project.
ROC curves were generated in Excel for the parameter sets that were successfully simulated.
See Fig 3. The graphs allow for the various entropy and rollout-based policies to be compared with
each other. On account of the small grid sizes and the idleness of the platforms when K=4 in the
rollout policy, these ROC curves do not demonstrate that the rollout policy is superior to a simple
entropy-based one. This less than stellar performance from the rollout occurs because the simulations
were so simple that no real planning needed to be done. The single key factor that would really be
required to bring out the inherent potential of the rollout algorithm is a larger size grid. On such a
small system an entropy-based policy does very well with the addition of a simple heuristic to make
the platforms avoid each other; there is not really very far they can go and their one-move level of
foresight covers a large percentage of the area of the grid. If such a policy was used on a 5x5 grid then
simply avoiding other platforms would not help a platform to effectively take measurements on cells
that are important to learn about. One more reason why both policies perform very similarly is that the
cells on the grid are uniformly distributed and thus all have an initial entropy value that is the same.
After the first look these entropy numbers will start to fork apart in two different directions each time
they are measured, but these simulations generally only have enough time in them for the platforms to
make one pass by each cell.

Figure 3
The graphs in Fig. 3 actually plot points for the FA to MD ratios: 1:1, 1:5, 1:10, 1:20, 1:30,
1:50, 1:70, 1:90, 1:95 and 1:100. However the data points for Pr(FA) and PR(D) tend to line up on top
of each other and only generate a handful of distinct points. This situation is similar to how one would
react if asked to choose between having a gift worth > $20 or else having $5 cash. One would choose
the gift. One would continue to choose the gift even if offered $10 or $15 cash. In a similar way, only
when the ratio of FA’s gets above certain cut-off points does the Likelihood Ratio Test risk having one
more MD for fewer FA’s. Therefore there is a very discrete nature to the way the data points are
plotted on the ROC graph.
One good thing about these simulations is that they show the ROC curves consistently improve
when more platforms are available for making measurements, which is what they should do. The

curves do not point out much of a difference between the rollout and entropy policies except for the
case of K=2. For the most part they depend solely on the number of vehicles available for exploring
the grid. Were this a larger system, it would be expected that a rollout policy with fewer vehicles
could out-perform an entropy-based policy with more vehicles, but this conclusion can not be drawn
from the data collected in these simulations.
Conclusion:
In conclusion this project suffered for want of a professional-grade implementation of the
Simplex Method for solving Linear Programming problems. The simulator did not show too much
difference between a myopic policy and a more sophisticated look-ahead + rollout policy for the
vehicle routing problem, however it did provide some intelligent results for the simulation cases it
could handle. Further analysis is required before any conclusions about the performance of the
algorithms used in this problem can be drawn. In the meantime, the work undertaken here should
serve as a reliable foundation for developing simulators that are more powerful in the future.

References:
[1] Bertsekas, D.P., Castañon, D.A., “Rollout Algorithms for Stochastic Scheduling Problems,” Journal of Heuristics, V. 5, 1999.
[2] Bertsekas, D.P., Castañon, D.A., Curry, M.L., Logan D. , “Adaptive Multi-platform Scheduling in a Risky Environment”,
Proceeding of Symposium on Advances in Enterprise Control (San Diego, CA, November 1999).

Appendix A, Overview of Simulation’s Code:
Note: The simulation uses class CCoord to help do movement arithmetic. (All class types are prefaced
with a “C” to indicate they are C++ data types.) If an object of class CCoord called “pos” is created
then CCoord defines an overloaded member function associated with the “+” sign in the code such
that pos+dir evaluates to the coordinate offset by the direction. The directions are enumerated as 0..3
for N,E,S,W. For example (2,2) + 0 is interpreted as going north from cell (2,2) which is cell (2,3).
This program only implements a single level of look-ahead and the look-ahead step of the solution is
built into the LP approximation used for the rollout step. The function CBase::GetEntropyAt() was
meant to be used to retrieve expected entropy values at different look-ahead levels in the future,
however it currently only supports one level. These expected entropy values are different than the
actual ones obtained during the measurements taken by the platforms in CBase::ScanObjects because
the latter are stochastic quantities, which are determined by what kind of sensor observation is made.
In order to have more than one level of look-ahead several changes must be made:
a. There are 2N
number of calls to a “calculate entropy” function each time a cell is visited and
approximately 4N
LP problems to be solved per sensor where N is the number of levels ahead
to look, so the complexity explodes for even small values of look-ahead. For two platforms
with a look-ahead of 2, the LP problem would have to be solved 32 times per time step in the
simulation if the platforms were in the interior of the grid.
b. A full set of constraints (all three equations) would need to be added for each tier of entropy
values included. A look-ahead of 2 would touch on some second level entropies and would
thus require conservation of flow constraints etc and would double the problem’s complexity.
c. The variable-encoding scheme within the vector x in the LP problem “Ax [<=>] b” has to
have another layer of complexity in it to account for the x's and y's of the different entropy
levels. Therefore “GetYPtr_time()” and “GetXPtr_time()” must change and become functions
of look-ahead depth as well. See “5)” below.
d. Need to add constraints to guarantee that the first level entropies are picked up before the
second level ones because the entropy values may not be monotonically decreasing and it
would be non-causal if the solver attempted to collect values out of order.
Pseudocode:
1) Instantiate the global object “simConsts” of type CSimConstants that contains read-only values of all simulation
parameters in one place where they are easy to get to from all program scopes. This class opens the input and
output files and reads in the parameters when it is constructed.
2) Instantiate a CGrid object that contains the current state of all the cells’ probability vectors. A CGrid object
creates CCell objects for each of the cells in the grid, G2
in all. Call CGrid’s member function
InitiatizeEntropyGrid() to allocate a matrix of storage space for caching current entropy values of each cell in the
grid.
3) Instantiate a CBase object, which creates one CSearcher object for each platform, specified by the value of K.
CSearcher objects represent the simulation’s platforms and are initialized to start at home position at the base.
CBase creates a set of CGrid objects called “futureGrids[]” for its internal use to cache entropy values used in the
look-ahead process. This way the computational overhead of calculating entropy values into the future is reduced
because the entropies can be reused and only need to be changed if a platform actually visits a cell (not just looks
at it in the planning process.)

4) CBase creates a CLPInterface object called “VRPlp” in its constructor. This object acts as the interface between
the C++ code of the simulation and the C code of lp_solve. When this object is constructed, it allocates the
memory needed by lp_solve for the solution of an LP problem. The CLPInterface object uses a member variable
called “oneConstraintRow[]” which is used to add constraint equations to the LP problem contained within the
CLPInterace object. A related pointer variable called “arrayBase” is set to the value &oneConstraintRow[1].
lp_solve indexes arrays from 1..N and so the simulator references arrayBase from 0..N-1, which accesses
oneConstraintRow[] from 1..N where N is its length. This memory is used to set constraint coefficients and add
constraints using the lp_solve function add_constraint. lp_solve searches the array oneConstraintRow[] for non-
zero coefficients and allocates memory in its sparse matrix data structure for storing non-zeros in the constraint
equations at these locations. All of the CLPInterface functions AddConstraintEqOne(), AddConstraintEqTwo(),
AddConstraintEqThree() assume that the array oneConstraintRow[] is all zeros before using it. Recycling the
array by just turning on coefficients, adding a constraint and then turning off those coefficients is much faster than
allocating a large array each time even though the array is only one-dimensional. Two flags in the file
SimConstants.h are used to determine how the problem is constructed. The flag “CREATE_LP_FILE” tells the
program whether or not it should store the original LP problem at t=0 in a format which can be read by the
simulator later on. The code stores LP problems in a format with an “.lp” extension. Storing the LP to a file
rather than creating a new one can allow the program to initialize much faster for large LP problems with
thousands of unknowns and constraints. A second flag “READ_FROM_OLD_FILE” in SimConstants.h specifies
whether or not the program should read in an old file or go through the work of generating a new LP problem,
which requires running more initialization to generate the problem’s constraint equations. These flags work
together. *Important note, the lp_solve fails to read in files with long variable and constraint names. Therefore
the flag NAME_VRP is used in simConstants.h to leave a *.lp file unnamed when it is stored. Then by turning the
flag back on the simulator can read in the unnamed LP problem and give it some easier to read names after the
read process is completed.
5) If required the code in CLPInterface will create a fresh LP problem and generate constraints for it, otherwise it will
read in the problem (in its initial state) from the file. Generating constraint equations from scratch is done by
calling a function for each of the three main equations of the simulation along with a function to set the
coefficients in the objective function to the appropriate entropy values. No value is assigned to visiting the home
cell. Two important functions used in this process are “GetYPtr_time” “GetXPtr_time”. These functions return
an index into the encoded array of variables that becomes the “x” in the system Ax [< = >] b that lp_solve will
solve. GetYPtr_time returns a pointer to a flow variable as a function of the time, platform, coordinate and
direction it is associated with. Similarly, the function GetXPtr_time returns a pointer to a task variable as a
function of the time, platform and coordinate associated with it. These pointers are either used directly or pointer
arithmetic is done to find the index value of variable from the beginning of the array. (See Appendix B for
information on how the LP variables are encoded in an array.) lp_solve has a non-standard convention because it
indexes columns from 1 to N and not 1 to N-1 as is the standard C convention. Rows are indexed from 1 to N as
well. Row 0 in the lp_solve code is used for storing the objective function, not a constraint equation.
6) The program begins its main loops: it loops over simulations and within each simulation over simulation times
from t = 0 to t ≤ T. The main loop proceeds as follows:
a) Call CBase::Update()
1) If not at the first step of the simulation:
i) Call CBase::ScanObjects() to do the sensor measurements for each platform and update the state
of each cell on which a measurement was performed. Update the cached entropy values (in the
main CGrid object) which were affected at cells where the platforms currently reside. When two
platforms are co-located, a cell is only observed and updated once.
ii) Call CBase::InitializeData() to update the state of the LP problem by adjusting the entropy
coefficients in the objective function of the affected cells for all future times. These coefficients
are shared so if platform k0 does a measurement at cell index 3 at time 1 then the new entropy
value will be stored for all 1 < t ≤ T for all platforms 0 ≤ k ≤ K. For example this means that
measurements taken by platform k0 will affect the coefficients in the LP problem for k1 and
vice-versa in a two-platform simulation. Thus information is shared. In order to enforce that
only future task assignments are allowed in an LP problem for t > 0, the task variables for all
cells that are visited are set to have an upper bound of 0.

2) Do the Look-ahead Process to plan the next platform movements:
i) Construct an object of class CRandomDirMap to randomize the directions platforms move in.
ii) For k = 0..K-1, loop over the possible directions that vehicle k can move in:
a) Call FindPossibleDirections() to figure out how many ways platform k can move from its
current position. FindPossibleDirections() will filter out directions which are in the grid but
which would lead a platform out of range of the base by the end of the simulation.
b) If in Release mode and not Debug mode, each time k changes value, assign a new
enumeration to the directions N, E, S, W with CRandomDirMap::Reset()
c) If the current move = pos+dir for platform k is valid then call CLPInterface::Solve() to
calculate the future cost-to-go from this possible future state:
1) Set a temporary bound to force platform k on the next turn to move to cell pos+dir
where “pos” is its current position. The move is forced by setting the appropriate flow
variable’s lower bound to be 1.0.
2) Call lp_solve to get the cost-to-go of this possible future state
3) Remove the temporary bound by setting the lower bound of the flow variable back to 0.
d) Check if the cost-to-go that is returned is better than the best one found so far for the other
moves. If so store the value in the variable in “maxEntropy” and the current direction “dir”
to the variable “bestDir”.
e) After looping over all possible directions, choose the best direction, store its value in the
array sensorMoves[] and set the bound for that flow back to 1.0 and leave it that way.
3) Move the sensor platforms:
i) Move the platforms one unit in the direction stored in sensorMoves[]
ii) Update the number of visits stored in each cell that a platform moves to.
b) If at the last time in the current simulation, calculate the cost information, update statistics for the
simulation and increase the simulation count.
7) If all the simulations are done let the objects go out of scope and their destructors will deallocate all of the memory
that was used.

Appendix B, LP Variable Encoding Scheme:
This appendix contains information about the packing scheme that the simulation uses for
storing the LP variables in the array that represents the x vector in the equation Ax [< = >] b. Task
variables X are stored in a three-level hierarchy while flow variables Y are stored in a four-level
hierarchy. (If this simulation was modified to use look-ahead values of > 1 then another level of
encoding would have to be added to both encoding schemes). The member functions GetYPtr_time()
and GetXPtr_time() within the CLPInterface class are responsible for doing this encoding work. There
are a set of functions {GetX(), GetY() and SetX(), SetY()} in this class which use the pointers returned
by GetYPtr_time() and GetXPtr_time() to read and set task and flow variables. The functions
Get*Ptr_time() get a pointer to the first byte of the chunk of memory associated with that time and then
call Get*Ptr_sensor() to find the first byte of the chunk of memory associated with that sensor
platform. In a similar way Get*Ptr_sensor() ends up calling Get*Ptr_coord(). At the bottom most
level GetYPtr_coord() will call GetYPtr_dir(). The task variables don’t have a fourth level in their
packing scheme. All of these functions are declared as inline functions, which effectively removes the
function calls and substitutes the code in place of the function call once the program is compiled in
Release mode. This makes the program easy to debug in Debug mode and it will still run without too
much function call overhead in Release mode. Here is a graphical representation of the encoding
scheme:
X and Y: First grouping by time:
ex using T=50, k = 0, m = (0,0)
y(m,dir,k,t) and x(m,k,t):
+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|y_50 |x_50 |y_49 |x_49 | | ... | y_0 | x_0 |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+
X and Y: Second grouping by sensor:

y_50(m,dir,k):
+-----+-----+-----+-----+-----+---------+
|y_k0 |y_k1 |y_k2 | ... | y_k(K-1)|
+-----+-----+-----+-----+-----+---------+
x_50(m,k):
+-----+-----+-----+-----+-----+---------+
| x_k0| x_k1| x_k2| ... | x_k(K-1)|
+-----+-----+-----+-----+-----+---------+
X and Y: Third grouping by cell:
ex y_(50,k=0)(m,dir):
+------+------+------+------+------+------+------+------+------+---
|y(0,0)|y(1,0)|y(2,0)| ... |y(0,1)|y(1,1)|y(2,1)|y(3,1)|
+------+------+------+------+------+------+------+------+------+---
------+------------+
... | y(G-1,G-1) |
------+------------+
ex x_(50,k=0)(m):
+------+------+------+------+------+------+------+------+------+---
|x(0,0)|x(1,0)|x(2,0)| ... |x(0,1)|x(1,1)|x(2,1)|x(3,1)|
+------+------+------+------+------+------+------+------+------+---
------+------------+
... | x(G-1,G-1) |
------+------------+
Y: Fourth Grouping by Direction of Flow:
ex y_(50,k=0,(0,0))(dir):
+-----+-----+
| y_N | y_E | (The number of arcs varies with y’s position in the grid)
+-----+-----+

Appendix C, Simulator Debug Information:
There are several functions that were used extensively in debugging the simulator. The
program is single-threaded and can thus be stepped through line-by-line when necessary. The only
exception to this ability is in the code of lp_solve which reads in *.lp files. The lp_solve function
read_lp_file() uses code in ytab.c and lex.c, which implement the yacc and lex parsing functions ported
from the unix operating system. Unfortunately, ytab.c actually includes the source code of lex.c into
its own file. This very atypical maneuver makes both file’s code very difficult to debug. This is
further complicated by the fact there are numerous goto’s in the files as well as #line directives which
arbitrarily label where the compiler says it is in the program’s compilation. This code is a mess and
cannot be stepped through in any kind of a reasonable fashion. It is also written to be cross-platform
compatible so there are layers upon layers of #ifdef … #endif statements to wade through: many
functions in the files have different versions corresponding to what platform or compiler is being used
to compile them. If break-points were set at the top of every function in the file ytab.c and the program
was stepped through, one could possibly learn enough about the programs flow to convert the
spaghetti-code into a more modern procedurized program. It might be easier to just rewrite the whole
parser, though there is a large body of code in the files read.c, ytab.c and lex.c.
With regard to testing the output of lp_solve and its ability to read in *.lp files, the following
three member functions of class CLPInterface were used: PrintSolution(), PrintLP() and WriteLP().
The first function, PrintSolution() will print the LP variables to file "Solution.dat" once an LP program
has been solved by calling Solve(). This file prints out a list of every task and flow variable at every
time for that solution. PrintLP() can be called with a filename such as “VRP.dat” to print out the entire
matrix structure A and the right hand side vector b in the equation “Ax [< = >] b” in dense format. For
small simulations where there are less than 2048 columns in the dense matrix, Microsoft Excel allowed

for the text file to be imported and viewed relatively easily. For larger simulations it was necessary to
either go through the labor of telling Excel to skip over importing some columns or use another
program such as Word and view the matrix’s coefficients in a much less orderly format. Lastly, the
member function WriteLP() can be used for debug purposes as well as storing an LP problem to a file
in its initial state. This is the function which will write out an LP problem to a file with a *.lp
extension. It is useful for looking at the LP problem in sparse format without all the extra zeros. All
of these test / debug functions are generally wrapped in “#ifdef _DEBUG … #endif ” preprocessor
statements so that they will be stripped out when the program is compiled in Release mode and thus
won’t waste processor time on I/O operations when the program is doing a real simulation.
To ease the difficulty of the debugging process, the #define’d constant “NAME_VRP” can be
set in simConstants.h to TRUE which will cause the Initialization routine in CLPInterface to label the
constraint rows and variable names of the LP problem. That way when debug output is dumped to a
file with one of the aforementioned functions, it is much easier to read the constraint equations and see
what is going on. Each constraint row has a label such as “Ey(n',2,k1,6)-Ey(2,n',k1,7) = 0.00”
to specify the purpose of that constraint. One important note: the letter “E” was used not to mean
expectation but as a space-abbreviated representation of the symbol “∑” for summation. This
particular example is a conservation of flow constraint for platform k1 that says there must be as many
vehicles flowing out of cell index 2 at time 7 as there are flowing into cell index 2 at time 6.
For debug output the class CSimConstants ties the output file (argv[2]) to the standard error
object “cerr”. By using “cerr << “a string” << endl;” throughout the program (after the simConsts
constructor), error reports can be logged to a file in this fashion. The default name for this log file is
“output.dat”.

Appendix D, Simulator Trial Run:
/* ------------------------------------------------------------------------------ */
input.dat:
3 2 2 1
0.3 0.70
1.0
1.0 10.0
0.10 0.90
100 4
// 1: <GRID_SIZE> <K> <numOfClasses> <numOfModes>
// 2: <init-prob-type1> <init-prob-type2> ... <init-prob-type(numOfClasses-1)>
// 3: <time-mode1> <time-mode2> ... <time-mode(numOfModes)>
// 4: <FA cost> <MD cost>
// 5: <prob-y=0-mode1-type1> <prob-y=0-mode1-type2> ... <prob-y=0-mode1-type(numOfClasses)>
// 6: <prob-y=0-mode2-type1> <prob-y=0-mode2-type2> ... <prob-y=0-mode2-type(numOfClasses)>
// 7: <prob-y=0-modeN-type1> <prob-y=0-modeN-type2> ... <prob-y=0-modeN-type(numOfClasses)>
// 8: <maxNumOfSimulations> <T>
/* ------------------------------------------------------------------------------ */
t = 0:
+-----+-----+-----+
|0.316|0.316|0.316|
| 6 | 7 | 8 |
+-----+-----+-----+
|0.316| 0.00|0.316|
| 3 |K0 K1| 5 |
+-----+-----+-----+
|0.316|0.316|0.316|
| 0 | 1 | 2 |
+-----+-----+-----+
sensorPos[0] == (1,1) sensorPos[1] == (1,1)
Objective Function Coeffs for t == 0:
X(0,k0,t4) X(1,k0,t4) X(2,k0,t4) X(3,k0,t4) X(4,k0,t4) X(5,k0,t4) X(6,k0,t4) X(7,k0,t4) X(8,k0,t4)
0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316
0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316
0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316
0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316
0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316
0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316
0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316
0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316
0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316

0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316
lp_solve()'s costs-to-go for prospective directions:
t==0, k==0, dir==N, entropyToGo == 1.8957150269382, k0 chooses N (for debugging, turned off logic to break ties
randomly)
t==0, k==0, dir==E, entropyToGo == 1.8957150269382
t==0, k==0, dir==S, entropyToGo == 1.8957150269382
t==0, k==0, dir==W, entropyToGo == 1.8957150269382
t==0, k==1, dir==N, entropyToGo == 1.8957150269382
t==0, k==1, dir==E, entropyToGo == 1.8957150269382, k1 chooses E
t = 1:
+-----+-----+-----+
|0.316|0.073|0.316|
| 6 | K0 | 8 |
+-----+-----+-----+
|0.316| 0.00|0.253|
| 3 | 4 | K1 |
+-----+-----+-----+
|0.316|0.316|0.316|
| 0 | 1 | 2 |
+-----+-----+-----+
0.316 0.316 0.316 0.316 0.00 0.253 0.316 0.073 0.316
0.316 0.316 0.316 0.316 0.00 0.253 0.316 0.073 0.316
0.316 0.316 0.316 0.316 0.00 0.253 0.316 0.073 0.316
0.316 0.316 0.316 0.316 0.00 0.253 0.316 0.073 0.316
0.316 0.316 0.316 0.316 0.00 0.253 0.316 0.073 0.316
0.316 0.316 0.316 0.316 0.00 0.253 0.316 0.073 0.316
0.316 0.316 0.316 0.316 0.00 0.00 0.316 0.00 0.316
0.316 0.316 0.316 0.316 0.00 0.00 0.316 0.00 0.316
0.316 0.316 0.316 0.316 0.00 0.00 0.316 0.00 0.316
0.316 0.316 0.316 0.316 0.00 0.00 0.316 0.00 0.316

t==1, k==0, dir==W, entropyToGo == 1.2638100179588, k0 chooses W
t==1, k==1, dir==S, entropyToGo == 1.2638100179588, k1 chooses S
t = 2:
+-----+-----+-----+
|0.073|0.073|0.316|
| K0 | 7 | 8 |
+-----+-----+-----+
|0.316| 0.00|0.253|
| 3 | 4 | 5 |
+-----+-----+-----+
|0.316|0.316|0.253|
| 0 | 1 | K1 |
+-----+-----+-----+
0.316 0.316 0.253 0.316 0.00 0.253 0.073 0.073 0.316
0.316 0.316 0.253 0.316 0.00 0.253 0.073 0.073 0.316
0.316 0.316 0.253 0.316 0.00 0.253 0.073 0.073 0.316
0.316 0.316 0.253 0.316 0.00 0.253 0.073 0.073 0.316
0.316 0.316 0.00 0.316 0.00 0.253 0.00 0.073 0.316
0.316 0.316 0.00 0.316 0.00 0.253 0.00 0.073 0.316
0.316 0.316 0.00 0.316 0.00 0.00 0.00 0.00 0.316
0.316 0.316 0.00 0.316 0.00 0.00 0.00 0.00 0.316
0.316 0.316 0.00 0.316 0.00 0.00 0.00 0.00 0.316
0.316 0.316 0.00 0.316 0.00 0.00 0.00 0.00 0.316
t==2, k==0, dir==S, entropyToGo == 0.63190500897941, k0 chooses S
t==2, k==1, dir==W, entropyToGo == 0.63190500897941, k1 chooses W

t = 3:
+-----+-----+-----+
|0.073|0.073|0.316|
| 6 | 7 | 8 |
+-----+-----+-----+
|0.253| 0.00|0.253|
| K0 | 4 | 5 |
+-----+-----+-----+
|0.316|0.073|0.253|
| 0 | K1 | 2 |
+-----+-----+-----+
0.316 0.073 0.253 0.253 0.00 0.253 0.073 0.073 0.316
0.316 0.073 0.253 0.253 0.00 0.253 0.073 0.073 0.316
0.316 0.00 0.253 0.00 0.00 0.253 0.073 0.073 0.316
0.316 0.00 0.253 0.00 0.00 0.253 0.073 0.073 0.316
0.316 0.00 0.00 0.00 0.00 0.253 0.00 0.073 0.316
0.316 0.00 0.00 0.00 0.00 0.253 0.00 0.073 0.316
0.316 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.316
0.316 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.316
0.316 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.316
0.316 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.316
(Don't bother calling lp_solve for last move)
t==3, k==0, dir==E, entropyToGo == 0.0, k0 forced to move E, no entropy at base's coordinate
t==3, k==1, dir==N, entropyToGo == 0.0, k1 forced to move N, no entropy at base's coordinate
t = 4:
+-----+-----+-----+
|0.073|0.073|0.316|
| 6 | 7 | 8 |
+-----+-----+-----+
|0.253| 0.00|0.253|
| 3 |K0 K1| 5 |
+-----+-----+-----+
|0.316|0.073|0.253|
| 0 | 1 | 2 |
+-----+-----+-----+
t==4, no more moves to plan

MS Project

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to MS Project

Similar to MS Project (20)

MS Project