Previous Work

Article [2] proposed Q-learning with independently learning agents.

However, agent's policy changes during training, resulting in a non-stationary environment. Thus [3] addresses this by inputting other agent's policy parameters to the Q function.

In the same year just before[1] was published, [4] proposed an approach similar to that of [1]'s. The

main differences are as follows:

has a critic for all agents
combine recurrent policies with feed-forward critics.
learn discrete policies.

Follow up

In [1], emergence of language was discussed, but it was more focused on DRL. Article[5] uses VisDial to achieve 'alignment' with English, i.e. to show how the agents interact through 'language' visually.

[6] used a centralized critic to estimate the Q-function and decentralized actors to optimize the agents' policies. Unlike [1], this article addresses multi-agent credit assignment. This algorithm is probably applied to competitive environs with continuous action spaces, and builds off the idea of difference rewards.

Article [7] suggests that [1] doesn't explicitly reasons about the learning behavior of other agents. This article attempts to use a method in which each agent shapes the anticipated learning of other agents in the environment.

In article [8], a new metric, joint-policy correlation was introduced to quantify the effect that policies learned in InRL can overfit to the other agents' policies during training and failing to sufficiently generalize during execution. This helps explain why [1]'s algorithm PSRO/DCH is good.

A scenario where agents compete in a 3D world rather than a 2D one was discussed in [9]. This article, however, did not use centralized-critic algorithm.

[10] addresses the “Learning-to-communicate” problem in multi-agent system by utilizing the combination of a trainable Vector of Locally Aggregated Descriptor (VLAD) algorithm and reinforcement learning. Embedding with a VLAD based end-to-end trainable communication information processing module (called VLAD Processing Core), CCNet can learn efficient communication protocols even from scratch under partially observable environments and possesses robustness to the dynamic changes of agent number as well.

Lowe et al.'s solution is limited over joint actions and therefore limited in the cooperation they can achieve. Article [11] mainly adjusts this defect.

REFERENCE:

[1]：Multi-agent actor-critic for mixed cooperative-competitive environments

[2]: M. Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pages 330–337, 1993.

[3]: G. Tesauro. Extending q-learning to general adaptive multi-agent systems. In Advances in neural information processing systems, pages 871–878, 2004.h

[4]: J. K. Gupta, M. Egorov, and M. Kochenderfer. Cooperative multi-agent control using deep reinforcement learning. 2017.

[5]: Das A, Kottur S, Moura J M F, et al. Learning cooperative visual dialog agents with deep reinforcement learning[J]. arXiv preprint arXiv:1703.06585, 2017

[6]: Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients[J]. arXiv preprint arXiv:1705.08926, 2017.

[7]: Foerster J, Chen R Y, Al-Shedivat M, et al. Learning with opponent-learning awareness[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 2018: 122-130

[8]: Lanctot M, Zambaldi V, Gruslys A, et al. A unified game-theoretic approach to multiagent reinforcement learning[C]//Advances in Neural Information Processing Systems. 2017: 4190-4203

[9]: Bansal T, Pachocki J, Sidor S, et al. Emergent complexity via multi-agent competition[J]. arXiv preprint arXiv:1710.03748, 2017

[10]: Wen X, Zha Z J, Wang Z, et al. CCNet: Cluster-Coordinated Net for Learning Multi-agent Communication Protocols with Reinforcement Learning[C]//Asian Conference on Machine Learning. 2018: 582-597

[11]: Foerster J N, de Witt C A S, Farquhar G, et al. Multi-Agent Common Knowledge Reinforcement Learning[J]. arXiv preprint arXiv:1810.11702, 2018

follow up - DailinH/MAgent GitHub Wiki

Previous Work

Follow up

REFERENCE:

⚠️ GitHub.com Fallback ⚠️

follow up - DailinH/MAgent GitHub Wiki

Previous Work

Follow up

REFERENCE:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️