Источник
AISTATS
Дата публикации
27.05.2022
Авторы
Татьяна Шаврина Алена Феногенова Александр Панченко Дарина Дементьева Варвара Логачева Ирина Никишина Ирина Кротова
Поделиться

A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification

Аннотация

It is often difficult to reliably evaluate models which generate text. Among them, text style transfer is a particularly difficult to evaluate, because its success depends on a number of parameters.We conduct an evaluation of a large number of models on a detoxification task. We explore the relations between the manual and automatic metrics and find that there is only weak correlation between them, which is dependent on the type of model which generated text. Automatic metrics tend to be less reliable for better-performing models. However, our findings suggest that, ChrF and BertScore metrics can be used as a proxy for human evaluation of text detoxification to some extent.

Присоединяйтесь к AIRI в соцсетях