Источник
CLEF / PAN
Дата публикации
15.08.2024
Авторы
Сергей Плетенёв
Поделиться

SomethingAwful at PAN 2024 TextDetox: Uncensored Llama 3 Helps to Censor Better.

Аннотация

In this paper, we report on our system for Multilingual Text Detoxification Task at PAN 2024. In this task, we needed to detoxify a multilingual corpus of texts. We propose an approach based on a large language models based onLlama3architecture with an additional method for jailbreaking model generation refusals. Our approach shows an advantage over Human References for multiple languages in manual evaluation, and outperforms baselines in automatic detoxification benchmark. Our work contributes to the ongoing effort to assess the vulnerability of LLMs to jailbreaking attacks, underscoring the latent capabilities of the large models.

Присоединяйтесь к AIRI в соцсетях