Source
Nucleic Acids Research
DATE OF PUBLICATION
04/29/2024
Authors
Mikhail Burtsev Yuri Kuratov Olga Kardymon Veniamin Fishman Aleksei Shmelev Dmitry Penzar Maxim Petrov Nikolay Akhmetyanov Maksim Tavritskiy Stepan Mamontov
Share

GENA-Web - GENomic Annotations Web Inference using DNA language models

Abstract

The advent of advanced sequencing technologies has significantly reduced the cost and increased the feasibility of assembling high-quality genomes. Yet, the annotation of genomic elements remains a complex challenge. Even for species with comprehensively annotated reference genomes, the functional assessment of individual genetic variants is not straightforward. In response to these challenges, recent breakthroughs in machine learning have led to the development of DNA language models. These transformer-based architectures are designed to tackle a wide array of genomic tasks with enhanced efficiency and accuracy. In this context, we introduce GENA-Web, a web-based platform that consolidates a suite of genome annotation tools powered by DNA language models. The version of GENA-Web presented here encompasses a diverse set of models trained on human data, including the prediction of promoter activity, annotation of splice sites, determination of various chromatin features, and a model for scoring of enhancer activity in Drosophila. GENA-Web is accessible online at https://dnalm.airi.net/

Join AIRI