Source
CLEF / ImageCLEF
DATE OF PUBLICATION
08/15/2024
Authors
Share

MMCP Team at ImageCLEFmed MEDVQA-GI 2024 Task: Diffusion Models for Text-to-Image Generation of Colonoscopy Images

Abstract

This paper introduces models developed for the ImageCLEFmed 2024 MEDVQA-GI task, aimed at leveraging text-to-image generative models to create a comprehensive dataset of artificial colonoscopy images from textual prompts. The task’s complexity arises from the novel and relatively uncharted nature of the provided training dataset, its limited size, and the specificity required in the generated images. We explore multiple approaches, including the efficient fine-tuning of large generative models such as Kandinsky and the modification of conditional latent Diffusion Probabilistic Models (DDPMs) tailored to text prompts. Our model achieved first place, with a Frechet Inception Distance (FID) score close to 0.1 on the official test set, reflecting the high quality and realism of the generated images.

Join AIRI