Learning from Noisy and Delayed Rewards The Value of Reinforcement Learning to Defense Modeling and Simulation
Alt, Jonathan K.
Darken, Christian J.
MetadataShow full item record
Modeling and simulation of military operations requires human behavior models capable of learning from experi-ence in complex environments where feedback on action quality is noisy and delayed. This research examines the potential of reinforcement learning, a class of AI learning algorithms, to address this need. A novel reinforcement learning algorithm that uses the exponentially weighted average reward as an action-value estimator is described. Empirical results indicate that this relatively straight-forward approach improves learning speed in both benchmark environments and in challenging applied settings. Applications of reinforcement learning in the verification of the re-ward structure of a training simulation, the improvement in the performance of a discrete event simulation scheduling tool, and in enabling adaptive decision-making in combat simulation are presented. To place reinforcement learning within the context of broader models of human information processing, a practical cognitive architecture is devel-oped and applied to the representation of a population within a conflict area. These varied applications and domains demonstrate that the potential for the use of reinforcement learning within modeling and simulation is great.
Approved for public release; distribution is unlimited
Showing items related by title, author, creator and subject.
Papadopoulos, Sotirios (Monterey, California. Naval Postgraduate School, 2010-09);The Cultural Geography (CG) model, under development in TRAC Monterey, is an open-source agent-based social simulation, designed to offer an insight into the response of the civilian population during Irregular Warfare ...
Loomis, Jean B. (Monterey, California: Naval Postgraduate School, 2016-06);In the event an underwater improvised explosive device (IED) were placed near a bridge, Explosive Ordinance Disposal (EOD) units would typically mitigate the threat by conducting a controlled detonation of the bomb. The ...
Giammarco, Kristin; Troncale, Len (Systems, 2018-05-28);This article describes preliminary research (a proof of concept test) on the potential value of formalizing Isomorphic Systems Processes (ISPs) based on systems science research using the Monterey Phoenix (MP) language, ...