Источник
Interspeech
Дата публикации
20.08.2023
Авторы
Евгений Бурнаев Сергей Николенко Сергей Баранников Эдуард Тульчинский Лаида Кушнарева Ирина Пионтковская Даниил Чернявский Kristian Kuznetsov
Поделиться

Topological Data Analysis for Speech Processing

Аннотация

We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we achieve an improvement of about 9% accuracy and 5% ERR on four common datasets; on CREMA-D, the proposed feature set reaches a new state of the art performance with accuracy 80.155. We also show that topological features are able to reveal functional roles of speech Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning. Our results demonstrate that TDA is a promising new approach for speech analysis, especially for tasks that require structural prediction. Appendices, an introduction to TDA, and other additional materials are available here - https://topohubert.github.io/speech-topology-webpages/

Присоединяйтесь к AIRI в соцсетях