ru

About
Publications
Blog
Careers

ru

Source

LREC

DATE OF PUBLICATION

10/13/2022

Authors

Oleg Serikov Timofey Atnashev Veronika Ganeeva Roman Kazakov Daria Matyash Michael Sonkin Ekaterina Voloshina Ekaterina Artemova

Share

Razmecheno: Named Entity Recognition from Digital Archive of Diaries “Prozhito”

Named entity recognition, Text annotation, Datasets

Abstract

The vast majority of existing datasets for Named Entity Recognition (NER) are built primarily on news, research papers and Wikipedia with a few exceptions, created from historical and literary texts. What is more, English is the main source for data for further labelling. This paper aims to fill in multiple gaps by creating a novel dataset "Razmecheno", gathered from the diary texts of the project "Prozhito" in Russian. Our dataset is of interest for multiple research lines: literary studies of diary texts, transfer learning from other domains, low-resource or cross-lingual named entity recognition. Razmecheno comprises 1331 sentences and 14119 tokens, sampled from diaries, written during the Perestroika. The annotation schema consists of five commonly used entity tags: person, characteristics, location, organisation, and facility. The labelling is carried out on the crowdsourcing platfrom Yandex.Toloka in two stages. First, workers selected sentences, which contain an entity of particular type. Second, they marked up entity spans. As a result 1113 entities were obtained. Empirical evaluation of Razmecheno is carried out with off-the-shelf NER tools and by fine-tuning pre-trained contextualized encoders. We release the annotated dataset for open access.

Full text

Similar publications

COLING / Workshop

Genai content detection task 1: English and multilingual machine-generated text detection: Ai vs. human

Yuxia Wang, Artem Shelmanov, Jonibek Mansurov, Akim Tsvigun, Vladislav Mikhailov, Rui Xing, Zhuohan Xie, Jiahui Geng, Giovanni Puccetti, Ekaterina Artemova, Minh Ngoc Ta, Mervat Abassy, Kareem Ashraf Elozeiri, Saad El Dine Ahmed El Etter, Maiya Goloburda, Tarek Mahmoud, Raj Vardhan Tomar, Nurkhan Laiyk, Osama Mohammed Afzal, Ryuto Koike, Masahiro Kaneko, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

SOURCE

COLING GenAIDetect

English and Multilingual Machine-Generated Text Detection: AI vs. Human

Yuxia Wang, Artem Shelmanov, Jonibek Mansurov, Akim Tsvigun, Vladislav Mikhailov, Rui Xing, Zhuohan Xie, Jiahui Geng, Giovanni Puccetti, Ekaterina Artemova, Jinyan Su, Minh Ngoc Ta, Mervat Abassy, Kareem Ashraf Elozeiri, Saad El Dine Ahmed El Etter, Maiya Goloburda, Tarek Mahmoud, Raj Vardhan Tomar, Nurkhan Laiyk, Osama Mohammed Afzal, Ryuto Koike, Masahiro Kaneko, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

SOURCE

Of Models and Men: Probing Neural Networks for Agreement Attraction with Psycholinguistic Data

Maxim Bazhukov, Ekaterina Voloshina, Sergey Pletenev, Arseny Anisimov, Oleg Serikov, Svetlana Toldova

SOURCE

Representational dissimilarity component analysis (ReDisCA)

Alexey Ossadtchi, Ilia Semenkov, Anna Zhuravleva, Oleg Serikov, Ekaterina Voloshina

SOURCE

Lost in Translation: Chemical Language Models and the Misunderstanding of Molecule Structures

Veronika Ganeeva, Andrey Sakhovskiy, Kuzma Khrabrov, Andrey Savchenko, Artur Kadurin, Elena Tutubalina

SOURCE

Super donors and super recipients: Studying cross-lingual transfer between high-resource and low-resource languages

Vitaly Protasov, Elisei Stakovskii, Ekaterina Voloshina, Tatyana Shavrina, Alexander Panchenko

SOURCE

Chemical Language Models Have Problems with Chemistry: A Case Study on Molecule Captioning Task

Veronika Ganeeva, Kuzma Khrabrov, Artur Kadurin, Andrey Savchenko, Elena Tutubalina

SOURCE

AIRI Institute

You can ask us a question or suggest a joint project in the field of AI

About
Publications
Blog
Careers

event@airi.net

For events invitations

partner@airi.net

For scientific cooperation and
partnership

pr@airi.net

For journalists and media

people@airi.net

For any questions connected with
employees and employment

© 2025, AIRI

Join AIRI

Name Email Your message I'm not a robot By submitting the form, I consent to the processing of my personal data

Message sent.

Thank you!

Something went wrong. Try again

About
- Values
- Numbers
- Focus areas
- Research
- Partners
- Management
- Contacts
Publications
Blog
Careers

Contact us

Join AIRI

You can ask us a question or suggest a joint project in the field of AI

Name Email Your message I'm not a robot By submitting the form, I consent to the processing of my personal data

Message sent.

Thank you!

Something went wrong. Try again

partner@airi.net

For scientific cooperation and
partnership

pr@airi.net

For journalists and media