Source
ACL / GenBench
DATE OF PUBLICATION
11/16/2024
Authors
Alexander Panchenko
Anton Razzhigaev
Denis Dimitrov
Elizaveta Goncharova
Maxim Kurkin
Irina Abdullaeva
Anastasia Lysenko
Share
OmniGen: A Multimodal Benchmark for Generalization Across Text, Visual, and Audio Modalities
Abstract
We introduce OmniDialog — the first trimodal comprehensive benchmark grounded in a knowledge graph (Wikidata) to evaluate the generalization of Large Multimodal Models (LMMs) across three modalities. Our benchmark consists of more than 4,000 dialogues, each averaging 10 turns, all annotated and cross-validated by human experts. The dialogues in our dataset are designed to prevent shortcut learning by incorporating various formats and misleading or irrelevant multimodal cues. We also evaluate both multimodal and unimodal models to gain insights into how they process modality inputs introduced in the conversation.
Similar publications
You can ask us a question or suggest a joint project in the field of AI
partner@airi.net
For scientific cooperation and
partnership
partnership
pr@airi.net
For journalists and media