Accelerating Transformers in Online RL

Source

ICLR WRL

DATE OF PUBLICATION

04/24/2025

Authors

Daniil Zelezetsky Alexey Kovalev Alexander Panov

Abstract

The appearance of transformer-based models in Reinforcement Learning (RL) has expanded the horizons of possibilities in robotics tasks, but it has simultaneously brought a wide range of challenges during their implementation, especially in model-free online RL. Most existing learning algorithms cannot be easily implemented with transformer-based models due to the instability of the latter. In this paper, we propose a method that uses the Accelerator agent as a transformer's trainer. The Accelerator trains in the environment by itself and simultaneously trains the transformer through behavior cloning during the first stage of the proposed algorithm. In the second stage, the pretrained transformer starts to interact with the environment in a fully online setting. As a result, this algorithm accelerates the transformer in terms of its performance and helps it to train online more stably.