Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback
Wenjia Ba, Tianyi Lin, Jiawei Zhang, Zhengyuan ZhouDoubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback
Curious about how players can learn and adapt in unknown games without knowing the game’s dynamics? In “Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback,” Ba, Lin, Zhang, and Zhou present a novel bandit learning algorithm for no-regret learning in games where each player only observes its reward determined by all players’ current joint action, not its gradient. Focusing on smooth and strongly monotone games, they introduce a bandit learning algorithm using self-concordant barrier functions. This algorithm achieves optimal single-agent regret and optimal last-iterate convergence rate in multiagent learning to the Nash equilibrium. Their work significantly improves previous methods and demonstrates the algorithm’s effectiveness through numerical results in various applications.