Source
ACL
DATE OF PUBLICATION
07/27/2025
Authors
Zoya Volovikova
Peter Kuderov
Grygory Gorbov
Alexander Panov
Alexey Skrynnik
Share
CrafText Benchmark: Advancing Language Grounding in Complex Multimodal Open-Ended World
Abstract
Grounding language models in multimodal environments is a pivotal challenge in AI, enabling agents to link linguistic inputs with sensory data, such as visual information. Existing environments, however, often limit the complexity of agent behavior due to restricted dynamics or vocabulary. To address these limitations, we propose a new benchmark named CrafText based on the Craftax environment—a dynamic, stochastic setting with extensive game mechanics and a rich vocabulary. This benchmark is designed to evaluate agents on complex tasks involving spatial reasoning, logic, and context, offering a rigorous platform for advancing multimodal AI research.
Similar publications
You can ask us a question or suggest a joint project in the field of AI
partner@airi.net
For scientific cooperation and
partnership
partnership
pr@airi.net
For journalists and media