Source
Cognitive Systems Research
DATE OF PUBLICATION
12/02/2024
Authors
Share

Applying opponent and environment modelling in decentralised multi-agent reinforcement learning

Abstract

Multi-agent reinforcement learning (MARL) has recently gainedpopularity and achieved much success in different kind of games such aszero-sum, cooperative or general-sum games. Nevertheless, the vast majority of modern algorithms assume information sharing during trainingand, hence, could not be utilised in decentralised applications as wellas leverage high-dimensional scenarios and be applied to applicationswith general or sophisticated reward structure. Thus, due to collectingexpenses and sparsity of data in real-world applications it becomes necessary to use world models to model the environment dynamics usinglatent variables — i.e. use world model to generate synthetic data fortraining of MARL algorithms. Therefore, focusing on the paradigm ofdecentralised training and decentralised execution, we propose an extension to the model-based reinforcement learning approaches leveragingfully decentralised training with planning conditioned on neighbouringco-players’ latent representations. Our approach is inspired by the ideaof opponent modelling. The method makes the agent learn in joint latentspace without need to interact with the environment. We suggest the approach as proof of concept that decentralised model-based algorithms areable to emerge collective behaviour with limited communication duringplanning, and demonstrate its necessity on iterated matrix games andmodified versions of StarCraft Multi-Agent Challenge (SMAC).

Join AIRI