Military stochastic scheduling treated as a "multi-armed bandit" problem
Glazebrook, Kevin D.
MetadataShow full item record
A Blue airborne force attacks a region defended by a single Red surface-to-air missile system (SAM). Red is uncertain about the Blues he faces, but is able to learn about them during the engagement. Red's objective is to develop a policy for shooting at the Blues to maximize the value of Blues shot down before he himself is destroyed. We show that index policies are optimal for Red in a range of scenarios and yield effective heuristics more generally. The quality of such index heuristics is confirmed in a computational study.