Источник
AISTATS
Дата публикации
27.05.2022
Авторы
Татьяна Шаврина
Алена Феногенова
Александр Панченко
Дарина Дементьева
Варвара Логачева
Ирина Никишина
Ирина Кротова
Поделиться
A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification
Аннотация
It is often difficult to reliably evaluate models which generate text. Among them, text style transfer is a particularly difficult to evaluate, because its success depends on a number of parameters.We conduct an evaluation of a large number of models on a detoxification task. We explore the relations between the manual and automatic metrics and find that there is only weak correlation between them, which is dependent on the type of model which generated text. Automatic metrics tend to be less reliable for better-performing models. However, our findings suggest that, ChrF and BertScore metrics can be used as a proxy for human evaluation of text detoxification to some extent.
Похожие публикации
Вы можете задать нам вопрос или предложить совместный проект в области ИИ
partner@airi.net
По вопросам научного
сотрудничества и партнерства
сотрудничества и партнерства
pr@airi.net
Для журналистов и СМИ