Utterance-Aware Adaptive Data Labeling and Summarization: Exploiting Large Language Models for Unbiased Dialog Annotation

Abstractive summarization, conversational summarization, data augmentation, evaluation, language models biases, semantic textual similarity

Abstract

The field of dialogue summarization has advanced significantly with large language models (LLMs), but their effectiveness can be limited by the size and diversity of training data, as well as concerns about bias. This study proposes a data augmentation method to address the lack of open-source dialogue datasets for summarization while reducing potential biases. Our method uses algorithms that process relationships between key phrases in a dialogue and its summary points, considering two distinct approaches for dialogues smaller or larger than the model’s context. We extract necessary relationships between dialogue and summarization using an LLM adapted to pre-labeled data, which demonstrates results up to 88.26% of accuracy compare to human annotation. We achieved a 4.33x expansion of the original DialogSum, SAMSum, and TweetSumm training sets, leading to a 0.16-point improvement in ROUGE-Lsum (up to 76% growth compared to the baseline). Additionally, we introduce a novel summarization metric tailored to larger than context summarization models during inference, capturing semantic similarity and comprehensiveness of summary points. This metric contributes to the credibility and sustainability of dialogue summarization systems by providing a more robust evaluation framework.

Full text DOWNLOAD pdf