Источник
Language Resources and Evaluation
Дата публикации
21.09.2023
Авторы
Артем Шелманов Екатерина Артемова Елена Тутубалина Владимир Иванов Suresh Manandhar Natalia Loukachevitch Igor Rozhkov Pavel Braslavski Tatiana Batura Alexander Pugachev Alexey Yandutov
Поделиться

NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links

Аннотация

This paper describes NEREL—a Russian news dataset suited for three tasks: nested named entity recognition, relation extraction, and entity linking. Compared to flat entities, nested named entities provide a richer and more complete annotation while also increasing the coverage of relations annotation and entity linking. Relations between nested named entities may cross entity boundaries to connect to shorter entities nested within longer ones, which makes it harder to detect such relations. NEREL is currently the largest Russian dataset annotated with entities and relations: it comprises 29 named entity types and 49 relation types. At the time of writing, the dataset contains 56 K named entities and 39 K relations annotated in 933 person-oriented news articles. NEREL is annotated with relations at three levels: (1) within nested named entities, (2) within sentences, and (3) with relations crossing sentence boundaries. We provide benchmark evaluation of current state-of-the-art methods in all three tasks. The dataset is freely available at https://github.com/nerel-ds/NEREL.

Присоединяйтесь к AIRI в соцсетях