Dealing With Sparse Rewards Using Graph Neural Networks

Deep reinforcement learning (DRL), Graph neural networks (GNNs), Partially observable Markov decision process (POMDP), Reward shaping

Abstract

Deep reinforcement learning in partially observable environments is a difficult task in itself and can be further complicated by a sparse reward signal. Most tasks involving navigation in three-dimensional environments provide the agent with minimal information. Typically, the agent receives a visual observation input from the environment and is rewarded once at the end of the episode. A good reward function could substantially improve the convergence of reinforcement learning algorithms for such tasks. The classic approach to increasing the density of the reward signal is to augment it with supplementary rewards. This technique is called reward shaping. In this study, we propose two modifications of one of the recent reward shaping methods based on graph convolutional networks: the first involving advanced aggregation functions, and the second utilizing the attention mechanism. We empirically validate the effectiveness of our solutions for the task of navigation in a 3D environment with sparse rewards. For the solution featuring the attention mechanism, we can also show that the learned attention is concentrated on edges corresponding to important transitions in the 3D environment.

Full text