Источник
NeurIPS
Дата публикации
09.12.2024
Авторы
Арип Асадулаев Александр Коротин Ваге Егиазарян Ростислав Корст Андрей Фильченков Евгений Бурнаев
Поделиться

Rethinking Optimal Transport in Offline Reinforcement Learning

Аннотация

We present a novel approach for offline reinforcement learning that bridges the gap between recent advances in neural optimal transport and reinforcement learning algorithms. Our key idea is to compute the optimal transport between states and actions with an action-value cost function and implicitly recover an optimal map that can serve as a policy. Building on this concept, we develop a new algorithm called Extremal Monge Reinforcement Learning that treats offline reinforcement learning as an extremal optimal transport problem. Unlike previous transport-based offline reinforcement learning algorithms, our method focuses on improving the policy beyond the behavior policy, rather than addressing the distribution shift problem. We evaluated the performance of our method on various continuous control problems and demonstrated improvements over existing algorithms.

Присоединяйтесь к AIRI в соцсетях