two-player zero-sum markov games

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback

We revisit the problem of learning in two-player zero-sum Markov games, focusing on developing an algorithm that is uncoupled, …

Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng