Источник
Neuroinformatics
Дата публикации
20.10.2024
Авторы
Денис Васильев Артем Латышев Петр Кудеров Nutsu Shiman Александр Панов
Поделиться

Dynamical Distance Adaptation in Goal-Conditioned Model-Based Reinforcement Learning

Аннотация

Goal-conditioned reinforcement learning aims to develop agents capable of reaching any state within a defined environment. Given the diversity of potential goals, reward engineering can become cumbersome. Therefore, designing algorithms that can train without external rewards is beneficial. This approach is formalized as unsupervised goal-conditioned reinforcement learning (UGCRL), wherein the goal space is a subset of the environmental states. To achieve this objective, it is necessary to engineer goal-conditioned rewards. In this work, we analyze goal-conditioned rewards based on distances between states in a model-based setting and examine the behavior of distance functions depending on different representations used to train such distances. We conducted experiments in continuous maze environments. PointMaze environment is a labyrinth with complex topology but simple control, while AntMaze is simple in topology but complex in control. We found that our method showed some improvements in distant goals in PointMaze. In AntMaze, our method demonstrated performance comparable to the baseline.

Присоединяйтесь к AIRI в соцсетях