Efficient algorithms for budget-constrained Markov decision processes
Loading...
Authors
Caramanis, Constantine
Dimitrov, Nedialko B. Dimitrov
Morton, David P.
Subjects
Advisors
Date of Issue
2014
Date
Publisher
Language
Abstract
Discounted, discrete-time, discrete state-space,
discrete action-space Markov decision rocesses (MDPs) form a classical topic in control, game theory, and learning, and as a result are widely applied, increasingly, in very large-scale applications. Many algorithms have been
developed to solve large-scale MDPs. Algorithms based on value iteration are particularly popular, as they are more efficient than the generic linear programming approach,
by an order of magnitude in the number of states of the MDP. Yet in the case of budget constrained MDPs, no more efficient algorithm than linear programming is known. The
theoretically slower running times of linear programming may limit the scalability of constrained MDPs piratically; while, theoretically, it invites the question of whether the increase is somehow intrinsic. In this paper we show that it is not, and provide two algorithms for budget constrained MDPs that are as efficient as value iteration. Denoting the running time of value iteration by VI, and
the magnitude of the input by U, for an MDP with mexpected budget constraints our first algorithm runs in time O(poly(m; log U) VI). Given a pre-specified degree of precision, for satisfying the budget constraints, our second algorithm runs in time O(logm poly(log U) 1 2 VI), but may produce solutions that overutilize each of the m budgets by a multiplicative factor of 1 + . In fact, one
can substitute value iteration with any algorithm, possibly specially designed for a specific MDP, that solves the MDP quickly to achieve similar theoretical guarantees. Both algorithms restrict attention to constrained infinite-horizon MDPs under discounted costs.
Type
Article
Description
Series/Report No
Department
Operations Research
Organization
Naval Postgraduate School (U.S.)
Identifiers
NPS Report Number
Sponsors
This work has been supported by NSF through grants CMMI-0653916 and CMMI-0800676, the DTRA through grant HDTRA1-08-1-0029, and the US DHS grant number 2008-DN-077-AR1021-05
Funder
Grant CMMI-0653916, Grant CMMI-0800676,
Grant HDTRA1-08-1-0029, Grant 2008-DN-077-AR1021-05
Format
6 p.
Citation
Distribution Statement
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.