Publication date
03/24/2025
Authors
Marat Khamadeev Nina Konovalova
Share

Semantic alignment made the generation of 3D models consistent


Diffusion models have firmly established themselves as a tool for image generation. They are also used for editing, which we have discussed in one of our previous blogs. However, images are not the only modality in which such architecture can operate. Diffusion can also be used for generating 3D models based on text prompts (text-to-3D) or from uploaded images (image-to-3D).

There are several approaches to this task. One of the most popular is the method of optimizing the loss function using Score Distillation Sampling. This method reuses the knowledge of a usual diffusion model, eliminating the need for extensive datasets with labeled 3D assets.

Despite the progress made, the generation of 3D models faces several challenges. For instance, all existing approaches struggle to ensure similarity between objects created from semantically close prompts. For example, when generating different characters, it is often impossible to achieve the same poses for them even with identical initial values from the random generator (seeds).

Current methods generate 3D models of decent quality according to the prompts but lack structural consistency

This difficulty has been overcome using the A3D method proposed by a team of researchers from several Russian and foreign scientific centers, including AIRI. Its work is based on learning structure-preserving transformations between multiple objects.

To implement this idea, the authors conditioned the 3D generator on a linear combination of latent code vectors corresponding to each prompt additionally providing information regarding this combination into the loss function. This not only achieved alignment of the generated results for each other but also enabled smooth transitions from one result to another through linear interpolation.

This technique allows for both global and local editing of 3D models, with the latter achieved through pairwise or multiple generations. Additionally, the new method helps to create hybrid objects at intermediate interpolation values. This can be useful in fields such as 3D animation or design. 


Examples of editing using A3D in pairwise generation

The paper describing the method was accepted at the ICLR 2025 conference. The authors also prepared a project page demonstrating the development.



Join AIRI