Source
Neuroinformatics
DATE OF PUBLICATION
10/19/2022
Authors
Mikhail Burtsev
Beksultan Sagyndyk
Dilyara Baymurzina
Share
DeepPavlov Topics: Topic Classification Dataset for Conversational Domain in English
Topic classification,
Topic classifier,
Topic modelling,
Open dataset,
Dialogue system,
Chatbots,
Conversational domain
Abstract
This paper presents “DeepPavlov Topics", a new dataset for topic classification in conversational domain. The dataset was collected and filtered automatically from web-sites and open datasets. We identify 33 topics, and present full (4.2M samples) and down-sampled (2.2M samples) versions of the “DeepPavlov Topics". The proposed topics are aimed to cover conversational domain in details but maintain interpretability. We report baseline classification results trained in multi-label setup to allow multiple classes per text during inference. We also release pre-trained models for topic classification including distilled and multi-lingual versions.
Similar publications
You can ask us a question or suggest a joint project in the field of AI
partner@airi.net
For scientific cooperation and
partnership
partnership
pr@airi.net
For journalists and media