Approximation Benefits of Policy Gradient Methods with Aggregated States

doi:10.1287/mnsc.2023.4788

Daniel Russo

Approximation Benefits of Policy Gradient Methods with Aggregated States

Management Science and Operations Research
Strategy and Management

Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregated representations, in which the state space is partitioned and either the policy or value function approximation is held constant over partitions. This paper shows a policy gradient method converges to a policy whose regret per period is bounded by ϵ, the largest difference between two elements of the state-action value function belonging to a common partition. With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as [Formula: see text], where γ is a discount factor. Faced with inherent approximation error, methods that locally optimize the true decision objective can be far more robust. This paper was accepted by Hamid Nazerzadeh, data science. Supplemental Material: Data are available at https://doi.org/10.1287/mnsc.2023.4788 .

Need a simple solution for managing your BibTeX entries? Explore CiteDrive!

Web-based, modern reference management
Collaborate and share with fellow researchers
Integration with Overleaf
Comprehensive BibTeX/BibLaTeX support
Save articles and websites directly from your browser
Search for new articles from a database of tens of millions of references

Try out CiteDrive

Approximation Benefits of Policy Gradient Methods with Aggregated States

Need a simple solution for managing your BibTeX entries? Explore CiteDrive!

More from our Archive

Ride-Hailing Networks with Strategic Drivers: The Impact of Platform Control Capabilities on Performance

A Replication Study of Operations Management Experiments in <i>Management Science</i>

Factors Affecting Digital Marketing Adoption in Pakistani Small and Medium Enterprises

Explainable machine learning for phishing feature detection

Managing Competition from Within and Outside: Using Strategic Inventory and Network Externality to Combat Copycats

The framework of parametric and nonparametric operational data analytics

Inventory and financial strategies of capital‐constrained firms under limited joint liability financing

Sustainable Green Economy for a Supply Chain with Remanufacturing by Both the Supplier and Manufacturer in a Varying Market

Decoding Culture: Tools for Behavioral Strategists

Evolutionary game analysis of rent seeking in inventory financing based on blockchain technology