ru

About
Publications
Blog
Careers

ru

Source

EMNLP

DATE OF PUBLICATION

12/11/2023

Authors

Anton Razzhigaev Arseniy Shakhmatov Anastasia Maltseva Vladimir Akhripkin Igor Pavlov Ilya Ryabov Angelina Kutz Alexander Panchenko Andrey Kuznetsov Denis Dimitrov

Share

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion.

Abstract

Text-to-image generation is a significant do- main in modern computer vision and has achieved substantial improvements through the evolution of generative architectures. Among these, there are diffusion-based models that have demonstrated essential quality enhance- ments. These models are generally split into two categories: pixel-level and latent-level ap- proaches. We present Kandinsky1, a novel ex- ploration of latent diffusion architecture, com- bining the principles of the image prior models with latent diffusion techniques. The image prior model is trained separately to map text embeddings to image embeddings of CLIP. An- other distinct feature of the proposed model is the modified MoVQ implementation, which serves as the image autoencoder component. Overall, the designed model contains 3.3B pa- rameters. We also deployed a user-friendly demo system that supports diverse genera- tive modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpaint- ing/outpainting. Additionally, we released the source code and checkpoints for the Kandinsky models. Experimental evaluations demonstrate a FID score of 8.03 on the COCO-30K dataset, marking our model as the top open-source per- former in terms of measurable image genera- tion quality.

Full text

Similar publications

SmurfCat at SemEval- 2025 Task 3: Bridging External Knowledge and Model Uncertainty for Enhanced Hallucination Detection

Elisei Rykov, V. Olisov, Maksim Savkin, Artem Vazhentsev, Ksenia Titova, Alexander Panchenko, Vasily Konovalov, Julia Belikova

SOURCE

TabaQA at SemEval-2025 Task 8: Column Augmented Generation for Question Answering over Tabular Data

Ekaterina Antropova, Egor Kratkov, Roman Derunets, Margarita Trofimova, Ivan Bondarenko, Alexander Panchenko, Vasily Konovalov, Maksim Savkin

SOURCE

Beyond Detection: Rethinking Education in the Age of AI-writing

Maria Marina, Alexander Panchenko, Vasily Konovalov

SOURCE

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, Jan Philip Wahle, Terry Ruas, Meriem Beloucif, Christine De Kock, Nirmal Surange, Daniela Teodorescu, Ibrahim Said Ahmad, David Ifeoluwa Adelani, Alham Fikri Aji, Felermino DMA Ali, Ilseyar Alimova, Vladimir Araujo, Nikolay Babakov, Naomi Baes, Ana-Maria Bucur, Andiswa Bukula, Guanqun Cao, Rodrigo Tufino Cardenas, Rendi Chevi, Chiamaka Ijeoma Chukwuneke, Alexandra Ciobotaru, Daryna Dementieva, Murja Sani Gadanya, Robert Geislinger, Bela Gipp, Oumaima Hourrane, Oana Ignat, Falalu Ibrahim Lawan, Rooweither Mabuya, Rahmad Mahendra, Vukosi Marivate, Andrew Piper, Alexander Panchenko, Charles Henrique Porto Ferreira, Vitaly Protasov, Samuel Rutunda, Manish Shrivastava, Aura Cristina Udrea, Lilian Diana Awuor Wanzare, Sophie Wu, Florian Valentin Wunderlich, Hanif Muhammad Zhafran, Tianhui Zhang, Yi Zhou, Saif M. Mohammad

SOURCE

SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection

Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, Seid Muhie Yimam, Jan Philip Wahle, Terry Ruas, Meriem Beloucif, Christine De Kock, Tadesse Destaw Belay, Ibrahim Said Ahmad, Nirmal Surange, Daniela Teodorescu, David Ifeoluwa Adelani, Alham Fikri Aji, Felermino Ali, Vladimir Araujo, Abinew Ali Ayele, Oana Ignat, Alexander Panchenko, Yi Zhou, Saif M. Mohammad

SOURCE

Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home

Viktor Moskvoretskii, Maria Lysyuk, Mikhail Salnikov, Nikolay Ivanov, Sergey Pletenev, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Irina Nikishina, Alexander Panchenko

SOURCE

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Kristian Kuznetsov, Laida Kushnareva, Polina Druzhinina, Anton Razzhigaev, Anastasia Voznyuk, Irina Piontkovskaya, Evgeny Burnaev, Serguei Barannikov

SOURCE

AIRI Institute

You can ask us a question or suggest a joint project in the field of AI

About
Publications
Blog
Careers

event@airi.net

For events invitations

partner@airi.net

For scientific cooperation and
partnership

pr@airi.net

For journalists and media

people@airi.net

For any questions connected with
employees and employment

© 2025, AIRI

Join AIRI

Name Email Your message I'm not a robot By submitting the form, I consent to the processing of my personal data

Message sent.

Thank you!

Something went wrong. Try again

About
- Values
- Numbers
- Focus areas
- Research
- Partners
- Management
- Contacts
Publications
Blog
Careers

Contact us

Join AIRI

You can ask us a question or suggest a joint project in the field of AI

Name Email Your message I'm not a robot By submitting the form, I consent to the processing of my personal data

Message sent.

Thank you!

Something went wrong. Try again

partner@airi.net

For scientific cooperation and
partnership

pr@airi.net

For journalists and media