Источник
ECIR
Дата публикации
06.04.2025
Авторы
Никита Крайко Иван Сидоров Фёдор Лапутин Александр Панченко Дарья Галимзянова Василий Коновалов
Поделиться

RURAGE: Robust Universal RAG Evaluator for Fast and Affordable QA Performance Testing

Аннотация

The advent of Large Language Models (LLMs) has significantly propelledthe popularity and demand for Question Answering (QA), particularlythe Retrieval-Augmented Generation (RAG) approach for a plephora of businessneeds and applications, most notably for user support and assistance of variouskinds. Such industrial NLP systems enable the scaling and optimization of businessprocesses, driving efficiency and innovation. Given the pivotal roles of informationretrieval and generation in RAG, the need for swift, continuous evaluation ofsystem performance becomes crucial. We introduce the open-source RURAGEframework1, designed to assess the quality of QA responses through a combinationof straightforward lexical analysis, model-based assessments, and uncertaintymetrics. Our empirical findings demonstrate that RURAGE’s ensemble of featuresachieves comparable outcomes to more resource-intensive evaluations utilizingLLM-as-a-judge, facilitating rapid development in industry settings.

Присоединяйтесь к AIRI в соцсетях