Источник
MICAI
Год публикации
2021
Авторы
Александр Панов Алексей Ковалёв Даниил Кириленко Евгений Осипов
Поделиться

Question Answering for Visual Navigation in Human-Centered Environments

Аннотация

In this paper, we propose an HISNav VQA dataset – a challenging dataset for a Visual Question Answering task that is aimed at the needs of Visual Navigation in human-centered environments. The dataset consists of images of various room scenes that were captured using the Habitat virtual environment and of questions important for navigation tasks using only visual information. We also propose a baseline for a HISNav VQA dataset, a Vector Semiotic Architecture, and demonstrate its performance. The Vector Semiotic Architecture is a combination of a Sign-Based World Model and Vector Symbolic Architectures. The Sign-Based World Model allows representing various aspects of an agent’s knowledge, and Vector Symbolic Architectures serve on a low computational level. The Vector Semiotic Architecture addresses the symbol grounding problem that plays an important role in the Visual Question Answering Task.

Присоединяйтесь к AIRI в соцсетях