Источник
Neuroinformatics
Дата публикации
19.10.2022
Авторы
Михаил Бурцев Бексултан Сагындюкб Диляра Баймурзина
Поделиться

DeepPavlov Topics: Topic Classification Dataset for Conversational Domain in English

Аннотация

This paper presents “DeepPavlov Topics", a new dataset for topic classification in conversational domain. The dataset was collected and filtered automatically from web-sites and open datasets. We identify 33 topics, and present full (4.2M samples) and down-sampled (2.2M samples) versions of the “DeepPavlov Topics". The proposed topics are aimed to cover conversational domain in details but maintain interpretability. We report baseline classification results trained in multi-label setup to allow multiple classes per text during inference. We also release pre-trained models for topic classification including distilled and multi-lingual versions.

Присоединяйтесь к AIRI в соцсетях