Источник
ACL
Дата публикации
17.02.2024
Авторы
Yuxia Wang Джонибек Мансуров Petar Ivanov Jinyan Su Артем Шелманов Аким Цвигун Osama Mohammed Afzal Tarek Mahmoud Giovanni Puccetti Thomas Arnold Alham Fikri Aji Nizar Habash Ирина Гуревич Preslav Nakov
Поделиться

M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

Аннотация

The advent of Large Language Models (LLMs)has brought an unprecedented surge in machinegenerated text (MGT) across diverse channels.This raises legitimate concerns about its potential misuse and societal implications. The needto identify and differentiate such content fromgenuine human-generated text is critical in combating disinformation, preserving the integrityof education and scientific fields, and maintaining trust in communication. In this work, we address this problem by introducing a new benchmark involving multilingual, multi-domain andmulti-generator for MGT detection — M4GTBench. It is collected for three task formulations: (1) mono-lingual and multi-lingual binary MGT detection; (2) multi-way detectionidentifies which particular model generates thetext; and (3) human-machine mixed text detection, where a word boundary delimiting MGTfrom human-written content should be determined. Human evaluation for Task 2 showsless than random guess performance, demonstrating the challenges to distinguish uniqueLLMs. Promising results always occur whentraining and test data distribute within the samedomain or generators.

Присоединяйтесь к AIRI в соцсетях