Source
Logic Journal of the IGPL
DATE OF PUBLICATION
03/22/2024
Authors
Alexander Panov Anfisa Chuganskaya Alexey Kovalev
Share

Sign-based image criteria for social interaction visual question answering

Abstract

The multi-modal tasks have started to play a significant role in the research on artificial intelligence. A particular example of that domain is visual–linguistic tasks, such as visual question answering. The progress of modern machine learning systems is determined, among other things, by the data on which these systems are trained. Most modern visual question answering data sets contain limited type questions that can be answered either by directly accessing the image itself or by using external data. At the same time, insufficient attention is paid to the issues of social interactions between people, which limits the scope of visual question answering systems. In this paper, we propose criteria by which images suitable for social interaction visual question answering can be selected for composing such questions, based on psychological research. We believe this should serve the progress of visual question answering systems.

Join AIRI