Celebrating Diversity in Shared Multi-Agent Reinforcement Learning
Motivation
- Significance:
MARL is useful for many real-world applications, sensor networks, traffic management, coordinate robots, etc. - Problems:
Hard to learn effective policies in complex multi-agent scenarios, and action-obs space grows largely with number of agents. And PDSP (policy decentralization with shared params) is used to solve scalability problem.
But for PSDP, the drawback is: tasks usually require diversified policies among agents, while shared params lead to similar behaviors(under similar obs).
A tradeoff: sharing necessary params to accelerate learning while improve diversity.
- Keywords:
MARL, PDSP, tradeoff
Backgrounds:
Dec-POMDP(Decentralized partially observable MDP), CTDE, IGM,
Model
- Structure:
diversity-driven MARL framework - Theory:
Maximation of information-theoretic objective
Action-Value Learning for Balancing Diversity and Sharing
Overall Learning Objective
Experiment
- Metrics:
- Benchmark tasks & Baselines:
Google Research Football(GRF), StarCraft II micro-management(SMAC)
CDS(proposed), QPLEX, QMIX, MAVEN, EOI - Design:
Demonstration of how the approach works. - Conclusion:
State-of-art result.
Thinking
- Pros:
A novel mechanism of being diverse when necessary into shared multi-agent reinforcement learning
The balance between individual diversity and group coordination - Cros:
No ablation studies shown and more explanation about L1 not shown
Links:
Video: https://sites.google.com/view/celebrate-diversity-shared
Code: https://github.com/lich14/CDS