Источник
Neuroinformatics
Дата публикации
19.10.2022
Авторы
Михаил Бурцев
Бексултан Сагындюкб
Диляра Баймурзина
Поделиться
DeepPavlov Topics: Topic Classification Dataset for Conversational Domain in English
Topic classification,
Topic classifier,
Topic modelling,
Open dataset,
Dialogue system,
Chatbots,
Conversational domain
Аннотация
This paper presents “DeepPavlov Topics", a new dataset for topic classification in conversational domain. The dataset was collected and filtered automatically from web-sites and open datasets. We identify 33 topics, and present full (4.2M samples) and down-sampled (2.2M samples) versions of the “DeepPavlov Topics". The proposed topics are aimed to cover conversational domain in details but maintain interpretability. We report baseline classification results trained in multi-label setup to allow multiple classes per text during inference. We also release pre-trained models for topic classification including distilled and multi-lingual versions.
Похожие публикации
Вы можете задать нам вопрос или предложить совместный проект в области ИИ
partner@airi.net
По вопросам научного
сотрудничества и партнерства
сотрудничества и партнерства
pr@airi.net
Для журналистов и СМИ