Source
SISY
DATE OF PUBLICATION
02/13/2023
Authors
Ilya Makarov Artem Sayapin
Share

Multi-Modal Deep Reinforcement Learning in ViZDoom with Audio Component

Abstract

In this work, the domain of audiovisual reinforcement learning is discussed in the context of ViZDoom environment with the addition of audio component. Two models with audiovisual features were compared: Asynchronous Proximal Policy Optimization (APPO) and Importance Weighted Actor-Learner Architecture (IMPALA). We trained the agents in two different scenarios for ViZDoom environment: Music Recognition and Duel. Agents learned to play in Duel scenario, while they achieve stable performance in Music recognition scenario. IMPALA managed to outperform APPO in Duel scenario, while APPO showed twice better results than IMPALA in Music Recognition scenario. Both agents are not able to achieve decent results in Music Recognition task and future research with provided directions of improvement could be made.

Join AIRI