TabaQA at SemEval-2025 Task 8: Column Augmented Generation for Question Answering over Tabular Data
Abstract
The DataBench shared task in the SemEval-2025 competition aims to tackle the problemof question answering (QA) from tabular data.Given the diversity of the structure of tables,there are different approaches to retrieving theanswer. Although Retrieval-Augmented Generationis a viable solution, extracting relevantinformation from tables remains a significantchallenge. In addition, the table can be prohibitivelylarge for direct integration into theLLM context. In this paper, we address QAover tabular data first by identifying relevantcolumns that might contain the answers, thenthe LLM generates answers by providing thecontext of the relevant columns, and finally,the LLM refines its answers. This approachsecured us 7th place in the DataBench lite category.
Similar publications
partnership