Source
AISTATS
DATE OF PUBLICATION
05/27/2022
Authors
Tatyana Shavrina
Alena Fenogenova
Alexander Panchenko
Daryna Dementieva
Varvara Logacheva
Irina Nikishina
Irina Krotova
Share
A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification
Abstract
It is often difficult to reliably evaluate models which generate text. Among them, text style transfer is a particularly difficult to evaluate, because its success depends on a number of parameters.We conduct an evaluation of a large number of models on a detoxification task. We explore the relations between the manual and automatic metrics and find that there is only weak correlation between them, which is dependent on the type of model which generated text. Automatic metrics tend to be less reliable for better-performing models. However, our findings suggest that, ChrF and BertScore metrics can be used as a proxy for human evaluation of text detoxification to some extent.
Similar publications
You can ask us a question or suggest a joint project in the field of AI
partner@airi.net
For scientific cooperation and
partnership
partnership
pr@airi.net
For journalists and media