Source
CLEF / PAN
DATE OF PUBLICATION
08/15/2024
Authors
Share

PAN 2024 Multilingual TextDetox: Exploring Cross-lingual Transfer Using Large Language Models

Abstract

Text detoxification is a text-to-text generation task that relies on available data for experiments. In recent years, this task has primarily focused on well-resourced languages while neglecting lower-resource languages. This work explores various approaches to building a multilingual solution for different languages, with an emphasis on 9 languages in the Multilingual Text Detoxification Task at PAN 2024. Throughout the experiments, we consider not only different model types but also employ fine-tuning on various combinations of datasets. As a result, we achieve third place in human evaluation and show promising progress towards developing a multilingual solution for the text detoxification task using large language models such as mT0 and XGLM. We also observe that fine-tuning on combinations of relatively similar languages is a promising direction—especially when real data for some languages is lacking.

Join AIRI