A multi-armed bandit approach to following a Markov Chain

Download
Author
Akin, Ezra W.
Date
2017-06Advisor
Szechtman, Roberto
Second Reader
Kress, Moshe
Metadata
Show full item recordAbstract
Across defense, homeland security, and law enforcement communities, leaders face the tension between making quick but also well informed decisions regarding time-dependent entities of interest. For example, consider a law enforcement organization (searcher) with a sizable list of potential terrorists (targets) but far fewer observational assets (sensors). The searcher's goal being to follow the target, but resource constraints make continuous coverage impossible, resulting in intermittent observational attempts.We model target behaviour as a discrete time Markov chain with the state space being the target's set of possible locations, activities, or attributes. In this setting, we define following the target as the searcher, at any given time step, correctly identifying and then allocating the sensor to the state which has the highest probability of containing the target. In other words, in each time period the searcher's objective is to decide where to send the sensor, attempting to observe the target in that time period, resulting in a hit or miss from which the searcher learns the target's true transition behaviour. We develop a Multi-Armed Bandit approach for efficiently following the target, where each state takes the place of an arm. Our search policy is five to ten times better than existing approaches.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.Related items
Showing items related by title, author, creator and subject.
-
When is Information Sufficient for Action? Search with Unreliable Yet Informative Intelligence
Atkinson, Michael P.; Lange, Rutger-Jan (2016);We analyze a variant of the whereabouts search problem, in which a searcher looks for a target hiding in one of n possible locations. Unlike in the classic version, our searcher does not pursue the target by actively moving ... -
A new branch-and-bound procedure for computing optimal search paths
Martins, Gustavo H. A. (Monterey, California. Naval Postgraduate School, 1993-03);We consider the problem of a searcher trying to detect a target that moves among a finite set of cells, C= 1,...,N, in discrete time, according to a specified Markov process. In each time period the searcher chooses one ... -
Stop and Look Detection Algorithm
Andrus, Alvin F. (Monterey, California. Naval Postgraduate School, 1985-05); NPS55-85-011The Stop and Look Detection Algorithm is a procedure for computing the cumulative probability of detection as a function of time for a searcher looking discretely for an evading target. The assumptions required for computation ...