Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
×

# Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

59 Aufrufe

Veröffentlicht am

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Als Erste(r) kommentieren

• Gehören Sie zu den Ersten, denen das gefällt!

### Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

2. 2. algorithms are scalable to large numbers of tasks, and suitable for real-time SM. We also extend the model of [21] to incorporate search actions in addition to classiﬁcation. We evaluate alternative approaches for obtaining feasible decision strategies, and evaluate the resulting performance of the RH algorithms using multi-sensor simulations. Our sim- ulation results demonstrate that our RH algorithms achieve performance comparable to the predicted lower bound of [21] and shed insight into the relative value of different strategies for partitioning sensor resources either geographically or by sensor specialization. The rest of this paper is organized as follows: Section II describes the formulation of the stochastic SM problem. Section III provides an example of the column generation technique for generating mixed strategies for SM. Section IV discusses how we create feasible, sequenced sensor schedules from these mixed strategies. Section V documents our sim- ulation results for various scenarios. Section VI summarizes our results and discusses areas for future work. II. PROBLEM FORMULATION AND BACKGROUND The problem formulation is an extension of the POMDP formulation presented in [21]. Assume that there are a ﬁnite number of locations 1, . . . , N, each of which may have an object with a given type, or which may be empty. Assume that there is a set of S sensors, each of which has multiple sensor modes, and that each sensor can observe one and only one location at each time with a selected mode. Let xi ∈ {0, 1, . . . , D} denote the state of location i, where xi = 0 if location i is unoccupied, and otherwise xi = k > 0 indicates location i has an object of type k. Let πi(0) ∈ D+1 be a discrete probability distribution over the possible states for the ith location for i = 1, . . . , N where D ≥ 2. Assume additionally that the random variables xi, i = 1, . . . , N are mutually independent. There are s = 1, . . . , S sensors, each of which has m = 1, . . . , Ms possible modes of observation. We assume there is a series of T discrete decision stages where sensors can select which location to measure, where T is large enough so that all of the sensors can use their available resources. At each stage, each sensor can choose to employ one and only of its modes on a single location to collect a noisy measurement concerning the state xi at that location. Each sensor s has a limited set of locations that it can observe, denoted by Os ⊆ {1, . . . , N}. A sensor action by sensor s at stage t is a pair: us(t) = (is(t), ms(t)) (1) consisting of a location to observe, is ∈ Os, and a mode for that observation, ms. Sensor measurements are modeled as belonging to a ﬁnite set y ∈ {1, . . . , Ls}. The likelihood of the measured value is assumed to depend on the sensor s, sensor mode m, location i and on the true state at the location xi but not on the states of other locations. Denote this likelihood as P(y|xi, i, s, m). We assume that this likelihood is time- invariant, and that the random measurements yi,s,m(t) are conditionally independent of other measurements yj,σ,n(τ) given the location states xi, xj for all sensor modes m, n provided i = j or τ = t. Each sensor has a limited quantity of Ri resources avail- able for measurements. Associated with the use of mode m by sensor s on location i is a resource cost rs(us(t)) to use this mode, representing power or some other type of resource required to use this mode from this sensor. T −1 t=0 rs(us(t)) ≤ Rs ∀ s ∈ [1, . . . , S] (2) This is a hard constraint for each realization of observations and decisions. Let I(t) denote the sequence of past sensing actions and measurement outcomes up to and including stage t − 1: I(t) = {(us(k), ys(k)), s = 1, . . . , S; k = 0, . . . , t − 1} Under the assumption of conditional independence of mea- surements and independence of individual states at each lo- cation, the conditional probability of (x1, . . . , xN ) given I(t) can be factored as a product of belief states at each location. Denote the belief state at location i as πi(t) = p(xi|I(t)). When a sensor measurement is taken, the belief state is up- dated according to Bayes’ Rule. A measurement of location i with the sensor-mode combination us(t) = (i, m) at stage t that generates observable y(t) updates the belief vector as: πi(t + 1) = diag{P(y(t)|xi = j, i, s, m)}πi(t) 1T diag{P(y(t)|xi = j, i, s, m)}πi(t) (3) where 1 is the D + 1 dimensional vector of all ones. Eq. (3) captures the relevant information dynamics that SM controls. In addition to information dynamics, there are resource dynamics that characterize the available resources at stage t. The dynamics for sensor s are given as: Rs(t + 1) = Rs(t) − rs(us(t)); Rs(0) = Rs (4) These dynamics constrain the admissible decisions by a sensor, in that it can only use modes that do not use more resources than are available. Given the ﬁnal information I(T), the quality of the information collected is measured by making an estimate of the state of each location i given the available information. Denote these estimates as vi i = 1, . . . , N. The Bayes’ cost of selecting estimate vi when the true state is xi is denoted as c(xi, vi) ∈ with c(xi, vi) ≥ 0. The objective of the SM, stochastic control formulation is to minimize: J = N i=1 E[c(xi, vi)] (5) by selecting adaptive sensor control policies and ﬁnal esti- mates subject to the dynamics of Eq. (3) and the constraints of Eq. (4) and Eq. (2). The results in [21] provide an SDP algorithm to solve the above problem, with cost-to-go at stage t depending on the joint belief state π(t) = [π1(t), . . . , πN (t)] and the residual resource state R(t) = [R1(t), . . . , RS(t)]. Because
3. 3. of this dependency, the cost-to-go does not decouple over locations. This leads to a very large POMDP problem with combinatorially many actions and an underlying belief state of dimension (D + 1)N that is computationally intractable unless there are very few locations. In [21], the above stochastic control problem was replaced with a simpler problem that provided a lower bound on the optimal cost, by expanding the set of admissible strategies, replacing the constraints of Eq. (2) by the “soft” constraints: E[ T −1 t=0 rs(us(t))] ≤ Rs ∀ s ∈ [1 . . . S] (6) To solve the simpler problem, [21] proposed incorporation of the soft constraints in Eq. (6) into the objective function using Lagrange multipliers λs for each sensor s. The augmented objective function is: ¯Jλ = J + T −1 t=0 S s=1 λs E[rs(us(t))] − S s=1 λsRs (7) A key result in [21] was that when the optimization of Eq. (7) was done over mixed strategies for given values of Lagrange multipliers λs, the stochastic control problem decoupled into independent POMDPs for each location, and the optimiza- tion could be performed using feedback strategies for each location i that depended only on the information collected for that location, Ii(t). These POMDPs have an underlying information state-space of dimension D + 1, corresponding to the number of possible states at a single location, and can be solved efﬁciently. Because the measurements and possible sensor actions are ﬁnite-valued, the set of possible SM strategies Γ is also ﬁnite. Let Q(Γ) denote the set of mixed strategies that assign probability q(γ) to the choice of strategy γ ∈ Γ. The problem of ﬁnding the optimal mixed strategies can be written as: min q∈Q(Γ) γ∈Γ q(γ) E γ J(γ) (8) γ∈Γ q(γ) E γ [ N i=1 T −1 t=0 rs(us(t))] ≤ Rs s ∈ [1, . . . , S] (9) γ∈Γ q(γ) = 1 (10) where we have one constraint for each of the S sensor re- source pools and an additional simplex constraint in Eq. (10) which ensures that q ∈ Q(Γ) forms a valid probability distribution. This is a large linear program (LP), where the number of possible variables are the strategies in Γ. However, the total number of constraints is S+1, which establishes that optimal solutions of this LP are mixtures of no more than S + 1 strategies. Thus, one can use a column generation approach [22], [23], [24] to quickly identify an optimal mixed strategy. In this approach, one solves Eq. (8) and Eq. (9) restricting the mixed strategies to be mixtures of a small subset Γ ⊂ Γ. The solution of the restricted LP has optimal dual prices λs, s = 1, . . . , S. Using these prices, one can determine a corresponding optimal pure strategy by minimizing: Jλ = N i=1 E[c(xi, vi)] + T −1 t=0 S s=1 λs E[rs(us(t))] − S s=1 λsRs (11) which the results in [21] show can be decoupled into N independent optimization problems, one for each location. Each of these problems is solved as a POMDP using standard algorithms such as point-based value iteration (PBVI) [25] to determine the best pure strategy γ1 for these prices. If the best pure strategy γ1 is already in the set Γ , then the solution of Eq. (8) and Eq. (9) restricted to Q(Γ ) is an optimal mixed strategy over all of Q(Γ). Otherwise, the strategy γ1 is added to the admissible set Γ , and the iteration is repeated. The result is a set of mixed strategies that achieve a performance level that is a lower bound on the original SM optimization problem with hard constraints. III. COLUMN GENERATION AND POMDP SUBPROBLEM EXAMPLE We present an example to illustrate the column generation algorithm and POMDP algorithms discussed previously. In this simple example we consider 100 objects (N=100), 2 pos- sible object types (D=2) with X = {non-military vehicle, military vehicle}, and 2 sensors that each have one mode (S = 2 and Ms = 1 ∀ s ∈ {1, 2}). Sensor s actions have resource costs: rs, where r1 = 1, r2 = 2. Sensors return 2 possible observation values, corresponding to binary object classiﬁcations, with likelihoods: P(yi,1,1(t)|xi, u1(t)) P(yi,2,1(t)|xi, u2(t)) 0.90 0.10 0.10 0.90 0.92 0.08 0.08 0.92 where the (j, k) matrix entry denotes the likelihood that y = j if xi = k. The second sensor has 2% better performance than the ﬁrst sensor but requires twice as many resources to use. Each sensor has Rs = 100 units of resources, and can view each location. Each of the 100 locations has a uniform prior of πi = [0.5 0.5]T ∀ i. For the performance objective, we use c(xi, vi) = 1 if xi = vi, and 0 otherwise, where the cost is 1 unit for a classiﬁcation error. Table I demonstrates the column generation solution pro- cess. The ﬁrst three columns are initialized by guessing val- ues of resource prices and obtaining the POMDP solutions, yielding expected costs and expected resource use for each sensor at those resource prices. A small LP is solved to obtain the optimal mixture of the ﬁrst three strategies γ1, . . . , γ3, and a corresponding set of dual prices. These dual prices are used in the POMDP solver to generate the fourth column γ4, which yields a strategy that is different from that of the ﬁrst 3 columns. The LP is re-solved for mixtures of the ﬁrst 4 strategies, yielding new resource prices that are used to generate the next column. This process continues until the solution using the prices after 7 columns yields a strategy that was already represented in a previous column, terminating
4. 4. γ1 γ2 γ3 γ4 γ5 γ6 γ7 min 50.0 2.80 2.44 1.818 8 10 6.22 R1 0 218 200 0 0 100 150 ≤ 100 R2 0 0 36 800 200 0 18 ≤ 100 Simplex 1 1 1 1 1 1 1 = 1 Optimal cost - - 26.22 21.28 7.35 5.95 5.95 Mixture weights 0 0.424 0 0 0.500 0.076 0 λc 1 1.0e15 0.024 0.010 0.238 0.227 0.217 0.061 λc 2 1.0e15 0.025 0.015 0 0.060 0.210 0.041 TABLE I: Column generation example with 100 objects. The tableau is displayed in its ﬁnal form after convergence. λc s describe the lambda trajectories up until convergence. R1 and R2 are resource constraints. γ1 is a ‘do-nothing’ strategy. Bold numbers represent useful solution data. Fig. 1: The 3 policy graphs that correspond to columns 2, 5 and 6 of Table I. The frequency of choosing each of these 3 strategies is controlled by the relative proportion of the mixture weight qc ∈ (0..1) with c ∈ {2, 5, 6}. the algorithm. The optimal mixture combines the strategies of the second, ﬁfth and sixth columns. When the master prob- lem converges, the optimal cost, J∗ , for the mixed strategy is 5.95 units. The resulting policy graphs are illustrated in Fig. 1, where branches up indicate measurements y = 1 (‘non-military’) and down y = 2 (‘military’). The red and green nodes denote the ﬁnal decision, vi, for a location. Note that the strategy of column 5 uses only the second sensor, whereas the strategies of columns 2 and 6 use only the ﬁrst sensor. The mixed strategy allows the soft resource constraints to be satisﬁed with equality. Table I also shows the resource costs and expected classiﬁcation performance of each column. The example illustrates some of the issues associated with the use of soft constraints in the optimization: the resulting solution does not lead to SM strategies that will always satisfy the hard constraints Eq. (2). We address this issue in the subsequent section. IV. RECEDING HORIZON CONTROL The column generation algorithm described previously solves the approximate SM problem with “soft” constraints in terms of mixed strategies that, on average, satisfy the resource constraints. However, for control purposes, one must select actual SM actions that satisfy the hard constraints Eq. (2). Another issue is that the solutions of the decoupled POMDPs provide individual sensor schedules for each lo- cation that must be interleaved into a single coherent sensor schedule. Furthermore, exact solution of the small decoupled POMDPs for each set of prices can be time consuming, making the resulting algorithm unsuited for real-time SM. To address this, we will explore a set of RH algorithms that will convert the mixed strategy solutions discussed in the previous section to actions that satisfy the hard constraints, and limit the computational complexity of the resulting algorithm. The RH algorithms have adjustable parameters whose effects we will explore in simulation. The RH algorithms start at stage t with an information state/resource state pair, consisting of available information about each location i = 1, . . . , N represented by the condi- tional probability vector πi(t) and available sensor resources Rs(t), s = 1, . . . , S. The ﬁrst step in the algorithms is to solve the SM problem of Eq. (5) starting at stage t to ﬁnal stage T subject to soft constraints Eq. (6), using the hierarchical column generation / POMDP algorithms to get a set of mixed strategies. We introduce a parameter corre- sponding to the maximum number of sensing actions per location to control the resulting computational complexity of the POMDP algorithms. The second step is to select sensing actions to implement at the current stage t from the mixed strategies. These strategies are mixtures of at most S + 1 pure strategies, with associated probabilistic weights. We explore three ap- proaches for selecting sensing actions: • str1: Select the pure strategy with maximum probability. • str2: Randomly select a pure strategy per location according to the optimal mixture probabilities. • str3: Select the pure strategy with positive probability that minimizes the expected sensor resource use (and thus leaves resources for use in future stages.) Once pure strategies for each location have been selected, the third step is to select a sensing action to be implemented for each location. Our approach is to select the ﬁrst sensing action of the pure strategy for each location. Note that there may not be enough sensor resources to execute the selected actions, particularly in the case where the pure strategy with maximum probability is selected. To address this, we rank sensing actions by their expected entropy gain [26], which is the expected reduction in entropy of the conditional probability distribution, πi(t), based on the anticipated measurement value. We schedule sensor actions in order of decreasing expected entropy gain, and perform those actions at stage t that have enough sensor resources to be feasible. The measurements collected from the scheduled actions are used to update the information states πi(t + 1) using Eq. (3). The resources used by the actions are eliminated from the available resources to compute Rs(t + 1) using Eq. (4). The RH algorithm is then executed from the new information state/resource state condition.
5. 5. Search Low-res Hi-res y1 y2 y3 y1 y2 y3 y1 y2 y3 empty 0.92 0.04 0.04 0.95 0.03 0.02 0.95 0.03 0.02 car 0.08 0.46 0.46 0.05 0.85 0.10 0.02 0.95 0.03 truck 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.90 0.08 military 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.03 0.95 TABLE II: Observation likelihoods for different sensor modes with the observation symbols y1, y2 and y3. Fig. 2: Illustration of scenario with two partially-overlapping sensors. V. SIMULATION RESULTS In order to evaluate the relative performance of the differ- ent RH algorithms, we performed a set of experiments with simulations. In these experiments, there were 100 locations, each of which could be empty, or have objects of three types, so the possible states of location i were xi ∈ {0, 1, 2, 3} where type 1 represents cars, type 2 trucks, and type 3 military vehicles. Sensors can have several modes: a search mode, a low resolution mode and a high resolution mode. The search mode primarily detects the presence of objects; the low resolution mode can identify cars, but confuses the other two types, whereas the high resolution mode can separate the three types. Observations are modeled as having three possible values. The search mode consumes 0.25 units of resources, whereas the low-resolution mode consumes 1 unit and the high resolution mode 5 units, uniformly for each sensor and location. Table II shows the likelihood functions that were used in the simulations. Initially, each location has a state with one of two prior probability distributions: πi(0) = [0.10 0.60 0.20 0.10]T , i ∈ [1, . . . , 10] or πi(0) = [0.80 0.12 0.06 0.02]T , i ∈ [11, . . . , 100]. Thus, the ﬁrst 10 locations are likely to contain objects, whereas the other 90 locations are likely to be empty. When multiple sensors are present, they may share some locations in common, and have locations that can only be seen by a speciﬁc sensor, as illustrated in Fig. 2. The cost function used in the experiments, c(xi, vi) is shown in Table III. The parameter MD represents the cost of a missed detection, and will be varied in the experiments. Table IV shows simulation results for a search and clas- sify scenario involving 2 identical sensors (with the same xi; vi empty car truck military empty 0 1 1 1 car 1 0 0 1 truck 1 0 0 1 military MD MD MD 0 TABLE III: Decision costs MD = 1 MD = 5 MD = 10 Hor. 3 str1 str2 str3 str1 str2 str3 str1 str2 str3 Res 30 3.64 3.85 3.85 11.82 12.88 12.23 15.28 14.57 14.50 Res 50 2.40 2.80 2.43 6.97 6.93 7.84 10.98 9.99 10.45 Res 70 2.45 2.32 1.88 3.44 3.99 4.04 6.14 6.48 5.10 Hor. 4 Res 30 3.58 3.46 3.52 12.28 12.62 11.90 14.48 15.91 15.59 Res 50 2.37 2.21 2.33 7.44 7.44 7.20 9.94 9.28 10.65 Res 70 1.68 1.33 1.60 3.59 3.57 3.62 6.30 5.18 5.86 Hor. 6 Res 30 3.51 3.44 3.73 11.17 11.85 12.09 15.17 14.99 13.6 Res 50 2.28 2.11 2.31 7.29 8.02 7.70 10.67 10.47 11.25 Res 70 1.43 1.38 1.44 3.60 3.73 3.84 4.91 5.09 5.94 Bounds Res 30 3.35 11.50 13.85 Res 50 2.21 6.27 9.40 Res 70 1.32 2.95 4.96 TABLE IV: Simulation results for 2 homogeneous, multi-modal sensors. str1: select the most likely pure strategy for all locations; str2: randomize the choice of strategy per location according to mixture probabilities; str3: select the strategy that yields the least expected use of resources for all locations. visibility), evaluating different versions of the RH control algorithms and with different resource levels. The variable “Horizon” is the total number of sensor actions allowed per location plus one additional action for estimating the location content. The table shows results for different resource levels per sensor, from 30 to 70 units, and displays the lower bound performance computed. The missed detection cost MD is varied from 1 to 10. The results shown in Table IV represent the average of 100 Monte Carlo simulation runs of the RH algorithms. The results show that using a longer horizon in the planning improves performance minimally, so that using a RH replanning approach with a short horizon can be used to reduce computation time with limited performance degradation. The results also show that the different RH algorithms have performance close to the optimal lower bound in most cases, with the exception being the case of MD = 5 with 70 units of sensing resources per sensor. For a horizon 6 plan, the longest horizon studied, the simulation performance is close to that of the associated bound. In terms of which strategy is preferable for converting the mixed strategies, the results of Table IV are unclear. For short planning horizons in the RH algorithms, the preferred strategy appears to be to use the least resources (str3), thus allowing for improvement from replanning. For the longer horizons, there was no signiﬁcant difference in performance among the three strategies. To illustrate the computational requirements of this scenario (4 states, 3 observations, 2 sen- sors (6 actions), full sensor-overlap), the number of columns generated by the column generation algorithm to compute a set of mixed strategies was on the order of 10-20 columns for the horizon 6 algorithms, which takes about 60 sec on a 2.2 Ghz, single-core, Intel P4 machine under linux using C code in ‘Debug’ mode (with 1000 belief-points for PBVI). Memory usage without optimizations is around 3 MB. There are typically 4-5 planning sessions in a simulation. Proﬁling indicates that roughly 80% of the computing time
6. 6. Homogeneous Heterogeneous MD MD Horizon 3 1 5 10 1 5 10 Res[150, 150] 5.69 16.93 30.38 6.34 18.15 31.23 Res[250, 250] 4.61 16.11 25.92 5.53 16.77 29.32 Res[350, 350] 4.23 15.30 21.45 5.12 16.41 27.41 Horizon 4 Res[150, 150] 5.02 16.06 20.61 5.64 16.85 20.61 Res[250, 250] 3.94 9.46 12.66 4.58 12.05 14.87 Res[350, 350] 3.35 8.58 12.47 4.28 9.41 12.65 Horizon 6 Res[150, 150] 4.62 15.66 19.56 5.27 16.20 19.56 Res[250, 250] 2.92 8.24 10.91 3.32 8.83 11.35 Res[350, 350] 2.18 4.86 7.15 2.66 6.63 9.17 TABLE V: Comparison of lower-bounds for 2 homogeneous, bi- modal sensors vs. 2 heterogeneous sensors. goes towards value backups in the PBVI routine and 15% goes towards (recursively) tracing decision-trees in order to back out (deduce) the measurement costs from hyperplane costs. (Every node in a decision-tree / policy-graph (for each pure strategy) has a corresponding hyperplane with a vector of cost coefﬁcients that represent classiﬁcation + measurement costs). In the next set of experiments, we compare the use of heterogeneous sensors that have different modes available. In these experiments, the 100 locations are guaranteed to have an object, so xi = 0 is not feasible. The prior probability of object type for each location is πi(0) = [0 0.7 0.2 0.1]T . Table V shows the results of experiments with sensors that have all sensing modes, versus an experiment where one sensor has only a low-resolution mode and the other sensor has both high and low-resolution modes. The table shows the lower bounds predicted by the column generation algorithm, to illustrate the change in performance expected from the different architectural choices of sensors. The results indicate that specialization of one sensor can lead to signiﬁcant degradation in performance due to inefﬁcient use of its resources. The next set of results explore the effect of spatial distribu- tion of sensors. We consider experiments where there are two homogeneous sensors which have only partially-overlapping coverage zones. (We deﬁne a ‘visibility group’ as a set of sensors that have a common coverage zone). Table VI gives bounds for different percentages of overlap. Note that, even when there is only 20% overlap, the achievable performance is similar to that of the 100% overlap case in Table V, indicating that proper choice of strategies can lead to efﬁcient sharing of resources from different sensors and equalizing their workload. The last set of results show the performance of the RH algorithms for three homogeneous sensors with partial overlap and different resource levels. The visibility groups are graphically portrayed in Fig. 3. Table VII presents the simulated cost values averaged over 100 simulations of the different RH algorithms and the lower bounds. The results support our previous conclusions: when a short horizon is used in the RH algorithm, and there are sufﬁcient resources, the strategy that uses the least resources is preferred as it allows for replanning when new information Overlap 60% Overlap 20% MD MD Horizon 3 1 5 10 1 5 10 Res[150, 150] 5.69 16.93 30.38 5.69 16.93 30.38 Res[150, 150] 4.61 16.11 25.98 4.61 16.11 25.92 Res[150, 150] 4.23 15.30 21.45 4.23 15.30 21.45 Horizon 4 Res[150, 150] 5.02 16.06 20.61 5.02 15.93 20.61 Res[150, 150] 3.94 9.46 12.66 3.94 9.46 12.66 Res[150, 150] 3.35 8.58 12.47 3.35 8.58 12.47 Horizon 6 Res[150, 150] 4.62 15.66 19.56 4.62 15.66 19.56 Res[150, 150] 2.92 8.25 10.91 2.94 8.24 10.91 Res[150, 150] 2.18 4.86 7.19 2.18 4.86 7.16 TABLE VI: Comparison of performance bounds with 2 homo- geneous sensors with partial overlap in coverage. Only the bold numbers are different. Fig. 3: The 7 visibility groups for the 3 sensor experiment indicating the number of locations in each group. is available. If the RH algorithm uses a longer horizon, then its performance approaches the theoretical lower bound, and the difference in performance between the three approaches for sampling the mixed strategy to obtain a pure strategy is statistically insigniﬁcant. Our results suggest that RH control with modest horizons of 2 or 3 sensor actions per location can yield performance close to the best achievable performance using mixed strategies. If shorter horizons are used to reduce computation, then an approach that samples mixed strategies by using the smallest amount of resources is preferred. The results also show that, with proper SM, geographically distributed sensors with limited visibility can be coordinated to achieve equivalent performance to centrally MD = 1 MD = 5 MD = 10 Horizon 3 str1 str2 str3 str1 str2 str3 str1 str2 str3 Res 100 5.26 6.08 5.57 17.23 17.44 16.79 22.02 21.93 22.16 Res 166 5.91 4.81 3.13 10.23 11.91 9.21 14.19 16.66 12.85 Res 233 3.30 3.75 3.43 10.15 9.32 5.88 14.49 12.55 8.21 Horizon 4 Res 100 5.32 5.58 5.93 17.26 16.88 16.17 21.92 20.94 21.35 Res 166 3.42 4.07 3.24 8.63 8.00 9.04 12.05 11.71 14.08 Res 233 3.65 3.07 3.29 5.27 7.14 5.38 8.25 10.08 7.90 Horizon 6 Res 100 5.79 5.51 5.98 17.13 17.90 17.44 22.03 20.56 22.17 Res 166 2.96 2.68 2.72 10.22 8.33 9.08 9.82 11.47 11.57 Res 233 1.52 2.00 1.70 4.81 4.13 4.24 5.64 7.20 5.11 Bounds Res 100 4.62 15.66 19.56 Res 166 2.92 8.22 10.89 Res 233 2.18 4.87 7.18 TABLE VII: Simulation results for 3 homogeneous sensors with partial overlap as shown in Fig. 3.
7. 7. pooled resources. In terms of the computational complexity of our RH al- gorithms, the main bottleneck is the solution of the POMDP problems. The LPs solved in the column generation approach are small and are solved in minimal time. Solving the POMDPs required to generate each column (one POMDP for each visibility group in cases with partial sensor overlap) is tractable by virtue of the hierarchical breakdown of the SM problem into independent subproblems. It is also very possible to accelerate these computations using multi-core CPU or (NVIDIA) GPU processors, as the POMDPs are highly parallelizable. VI. CONCLUSIONS In this paper, we introduce RH algorithms for near- optimal, closed-loop SM with multi-modal, resource- constrained, heterogeneous sensors. These RH algorithms exploit a lower bound formulation developed in earlier work that decomposes the SM optimization into a master problem, which is addressed with linear-programming techniques, and single location stochastic control problems that are solved using POMDP algorithms. The resulting algorithm generates mixed strategies for sensor plans, and the RH algorithms convert these mixed strategies into sensor actions that satisfy sensor resource constraints. Our simulation results show that the RH algorithms achieve performance close to that of the theoretical lower bounds in [21]. The results also highlight the different beneﬁts of choosing a longer horizon for RH strategies and alternative approaches at sampling the mixed strategy solu- tions. Our simulations also show the effects of geographically distributing sensors so that there is limited overlap in ﬁeld of view and the effects of specializing sensors by using a restricted number of modes. There are many interesting directions for extensions to this work. First, one could consider the presence of object dynamics, where objects can arrive at or depart from speciﬁc locations. Second, one can also consider options for sensor motion, where individual sensors can change locations and thus observe new areas. Third, one could consider a set of objects that have deterministic, but time-varying, visibility proﬁles. Finally, one could consider approaches that reduce the computational complexity of the resulting algorithms, ei- ther through exploitation of parallel computing architectures or through the use of off-line learning or other approximation techniques. REFERENCES [1] B. Koopman, Search and Screening: General Principles with Histor- ical Applications. Pergamon, New York NY, 1980. [2] S. J. Benkoski, M. G. Monticino, and J. R. Weisinger, “A survey of the search theory literature,” Naval Research Logistics, vol. 38, no. 4, pp. 469–494, 1991. [3] D. A. Casta˜n´on, “Optimal search strategies in dynamic hypothesis test- ing,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 25, no. 7, pp. 1130–1138, Jul 1995. [4] A. Wald, “On the efﬁcient design of statistical investigations,” The Annals of Mathematical Statistics, vol. 14, pp. 134–140, 1943. [5] ——, “Sequential tests of statistical hypotheses,” The Annals of Mathematical Statistics, vol. 16, no. 2, pp. 117–186, 1945. [Online]. Available: http://www.jstor.org/stable/2235829 [6] D. V. Lindley, “On a measure of the information provided by an experiment,” Annals of Mathematical Statistics, vol. 27, pp. 986–1005, 1956. [7] J. C. Kiefer, “Optimum experimental designs,” Journal of the Royal Statistical Society Series B, vol. 21, pp. 272–319, 1959. [8] H. Chernovv, Sequential Analysis and Optimal Design. SIAM, Philadelphia, PA, 1972. [9] V. V. Fedorov, Theory of Optimal Experiments. Academic Press, New York, 1972. [10] K. Kastella, “Discrimination gain to optimize detection and classiﬁca- tion,” Systems, Man and Cybernetics, Part A, IEEE Transactions on, vol. 27, no. 1, pp. 112–116, Jan. 1997. [11] C. Kreucher, K. Kastella, and I. Alfred O. Hero, “Sensor management using an active sensing approach,” Signal Processing, vol. 85, no. 3, pp. 607–624, 2005. [12] M. Athans, “On the determination of optimal costly measurement strategies for linear stochastic systems,” Automatica, vol. 8, no. 4, pp. 397–412, 1972. [Online]. Avail- able: http://www.sciencedirect.com/science/article/B6V21-47SV18C- 5/2/dc50e03b2ec82f34c592d4056c0da466 [13] V. Krishnamurthy and R. Evans, “Hidden markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking,” Signal Processing, IEEE Transactions on, vol. 49, no. 12, pp. 2893– 2908, Dec 2001. [14] R. Washburn, M. Schneider, and J. Fox, “Stochastic dynamic program- ming based approaches to sensor resource management,” Information Fusion, 2002. Proceedings of the Fifth International Conference on, pp. 608–615 vol.1, 2002. [15] J. C. Gittins, “Bandit processes and dynamic allocation indices,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 41, no. 2, pp. 148–177, 1979. [Online]. Available: http://www.jstor.org/stable/2985029 [16] W. Macready and I. Wolpert, D.H., “Bandit problems and the explo- ration/exploitation tradeoff,” Evolutionary Computation, IEEE Trans- actions on, vol. 2, no. 1, pp. 2–22, Apr. 1998. [17] C. Kreucher and A. O. H. III, “Monte carlo methods for sensor management in target tracking,” in IEEE Nonlinear Statistical Signal Processing Workshop, 2006. [18] E. Chong, C. Kreucher, and A. Hero, “Monte-carlo-based partially observable markov decision process approximations for adaptive sens- ing,” Discrete Event Systems, 2008. WODES 2008. 9th International Workshop on, pp. 173–180, May 2008. [19] D. A. Casta˜n´on, A. Hero, D. Cochran, and K. Kastella, Foundations and Applications of Sensor Management, 1st ed. Springer, 2008, ch. 1. [20] D. A. Casta˜n´on, “Approximate dynamic programming for sensor management,” in Proc 36th Conference on Decision and Control. IEEE, 1997, pp. 1202–1207. [21] ——, “Stochastic control bounds on sensor network performance,” Decision and Control, 2005 and 2005 European Control Conference. CDC-ECC ’05. 44th IEEE Conference on, pp. 4939–4944, Dec. 2005. [22] P. C. Gilmore and R. E. Gomory, “A linear programming approach to the cutting-stock problem,” Operations Research, vol. 9, no. 6, pp. 849–859, 1961. [Online]. Available: http://www.jstor.org/stable/167051 [23] G. B. Dantzig and P. Wolfe, “The decomposition algorithm for linear programs,” Econometrica, vol. 29, no. 4, pp. 767–778, 1961. [Online]. Available: http://www.jstor.org/stable/1911818 [24] K. A. Yost and A. R. Washburn, “The lp/pomdp marriage: Optimiza- tion with imperfect information,” Naval Research Logistics, vol. 47, no. 8, pp. 607–619, 2000. [25] J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: An anytime algorithm for pomdps,” in International Joint Conference on Artiﬁcial Intelligence (IJCAI), Aug. 2003, pp. 1025–1032. [26] K. Kastella, “Discrimination gain for sensor management in multitar- get detection and tracking,” in IEEE-SMC and IMACS Multiconference CESA 1996, vol. 1, Jul. 1996, pp. 167–172.