Learning from Noisy and Delayed Rewards The Value of Reinforcement Learning to Defense Modeling and Simulation

Authors
Alt, Jonathan K.
Subjects
reinforcement learning
architecture
agents
autonomous systems
Advisors
Darken, Christian J.
Date of Issue
2012-09
Date
Sep-12
Publisher
Monterey, California. Naval Postgraduate School
Language
Abstract
Modeling and simulation of military operations requires human behavior models capable of learning from experi-ence in complex environments where feedback on action quality is noisy and delayed. This research examines the potential of reinforcement learning, a class of AI learning algorithms, to address this need. A novel reinforcement learning algorithm that uses the exponentially weighted average reward as an action-value estimator is described. Empirical results indicate that this relatively straight-forward approach improves learning speed in both benchmark environments and in challenging applied settings. Applications of reinforcement learning in the verification of the re-ward structure of a training simulation, the improvement in the performance of a discrete event simulation scheduling tool, and in enabling adaptive decision-making in combat simulation are presented. To place reinforcement learning within the context of broader models of human information processing, a practical cognitive architecture is devel-oped and applied to the representation of a population within a conflict area. These varied applications and domains demonstrate that the potential for the use of reinforcement learning within modeling and simulation is great.
Type
Thesis
Description
Series/Report No
Department
Organization
Modeling, Virtual Environments, and Simulation Institute (MOVES)
Identifiers
NPS Report Number
Sponsors
Funder
Format
Citation
Distribution Statement
Approved for public release; distribution is unlimited.
Rights