Source
CLEF / PAN
DATE OF PUBLICATION
08/15/2024
Authors
Share
SomethingAwful at PAN 2024 TextDetox: Uncensored Llama 3 Helps to Censor Better.
Abstract
In this paper, we report on our system for Multilingual Text Detoxification Task at PAN 2024. In this task, we needed to detoxify a multilingual corpus of texts. We propose an approach based on a large language models based onLlama3architecture with an additional method for jailbreaking model generation refusals. Our approach shows an advantage over Human References for multiple languages in manual evaluation, and outperforms baselines in automatic detoxification benchmark. Our work contributes to the ongoing effort to assess the vulnerability of LLMs to jailbreaking attacks, underscoring the latent capabilities of the large models.
Similar publications
You can ask us a question or suggest a joint project in the field of AI
partner@airi.net
For scientific cooperation and
partnership
partnership
pr@airi.net
For journalists and media