ru

About
Publications
Blog
Careers

ru

Source

AISTATS

DATE OF PUBLICATION

05/27/2022

Authors

Varvara Logacheva Daryna Dementieva Irina Krotova Alena Fenogenova Irina Nikishina Tatyana Shavrina Alexander Panchenko

Share

A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification

Abstract

It is often difficult to reliably evaluate models which generate text. Among them, text style transfer is a particularly difficult to evaluate, because its success depends on a number of parameters.We conduct an evaluation of a large number of models on a detoxification task. We explore the relations between the manual and automatic metrics and find that there is only weak correlation between them, which is dependent on the type of model which generated text. Automatic metrics tend to be less reliable for better-performing models. However, our findings suggest that, ChrF and BertScore metrics can be used as a proxy for human evaluation of text detoxification to some extent.

Full text

Similar publications

EAI: Emotional Decision-Making of LLMs in Strategic Games and Ethical Dilemmas

Mikhail Mozikov, Nikita Severin, Valeria Bodishtianu, Maria Glushanina, Ivan Nasonov, Daniil Orekhov, Vladislav Pekhotin, Ivan Makovetskiy, Mikhail Baklashkin, Vasily Lavrentyev, Akim Tsvigun, Denis Turdakov, Tatyana Shavrina, Andrey Savchenko, Ilya Makarov

SOURCE

SkipCLM: Enchancing Crosslingual Alignment of Decoder Transformer Models via Contrastive Learning and Skip Connection

Nikita Sushko, Alexander Panchenko, Elena Tutubalina

SOURCE

Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images

Elisei Rykov, Ksenia Petrushina, Ksenia Titova, Anton Razzhigaev, Alexander Panchenko, Vasily Konovalov

SOURCE

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Sergey Pletenev, Maria Marina, Daniil Moskovskiy, Vasily Konovalov, Pavel Braslavski, Alexander Panchenko, Mikhail Salnikov

SOURCE

Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models

Artem Vazhentsev, Lyudmila Rvanova, Ivan Lazichny, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Artem Shelmanov

SOURCE

SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators

Daniil Moskovskiy, Nikita Sushko, Sergey Pletenev, Alexander Panchenko, Elena Tutubalina

SOURCE

Semantically-Informed Regressive Encoder Score

Vasiliy Viskov, George Kokush, Daniil Larionov, Steffen Eger, Alexander Panchenko

SOURCE

AIRI Institute

You can ask us a question or suggest a joint project in the field of AI

About
Publications
Blog
Careers

event@airi.net

For events invitations

partner@airi.net

For scientific cooperation and
partnership

pr@airi.net

For journalists and media

people@airi.net

For any questions connected with
employees and employment

© 2025, AIRI

Join AIRI

Name Email Your message I'm not a robot By submitting the form, I consent to the processing of my personal data

Message sent.

Thank you!

Something went wrong. Try again

About
- Values
- Numbers
- Focus areas
- Research
- Partners
- Management
- Contacts
Publications
Blog
Careers

Contact us

Join AIRI

You can ask us a question or suggest a joint project in the field of AI

Name Email Your message I'm not a robot By submitting the form, I consent to the processing of my personal data

Message sent.

Thank you!

Something went wrong. Try again

partner@airi.net

For scientific cooperation and
partnership

pr@airi.net

For journalists and media