Multi-Modal Deep Reinforcement Learning in ViZDoom with Audio Component
In this work, the domain of audiovisual reinforcement learning is discussed in the context of ViZDoom environment with the addition of audio component. Two models with audiovisual features were compared: Asynchronous Proximal Policy Optimization (APPO) and Importance Weighted Actor-Learner Architecture (IMPALA). We trained the agents in two different scenarios for ViZDoom environment: Music Recognition and Duel. Agents learned to play in Duel scenario, while they achieve stable performance in Music recognition scenario. IMPALA managed to outperform APPO in Duel scenario, while APPO showed twice better results than IMPALA in Music Recognition scenario. Both agents are not able to achieve decent results in Music Recognition task and future research with provided directions of improvement could be made.