For various machine learning tasks, such as facial recognition, medical image analysis, or speech recognition, a large amount of training data is often required. The accuracy of the model depends, in part, on the quality of these data. Unfortunately, sometimes it difficult to create a satisfactory training set. One solution is domain adaptation, where the model is trained on a slightly different but richer dataset, and then adapted to the desired images. We have previously reported on how this approach speeds up the inference of generative adversarial networks.
Another approach is to synthesize images using generative models based on some input data, such as a sketch or a noisy image. Specialists refer to this as data translation fr om one domain to another. The algorithm produces a pair to an input image, so it is best to train the model on paired datasets. However, most often one has to use non-paired datasets, wh ere attempts to link the input and output domains are usually heuristic and therefore not rigorous and tuned manually.
However, the problem of transferring fr om one domain to another can be formulated mathematically strictly by considering the transition from one probability distribution to another. In this case, optimal transport (OT) theory and its implementation using various algorithms are useful. We previously reported on how a group of scientists from AIRI and Skoltech developed an Entropic Neural Optimal Transport method and created a benchmark for it and other similar algorithms.
In parallel with this research, the group found that even greater similarity between input and output images could be achieved by considering specific formulations of the optimal transport problem. Inspired by recent progress in neural optimal transport, the team proposed a mathematical formulation for finding the theoretically best domain translation using non-paired training sets, which they called Extreme Transport (ET).
The authors created a scalable algorithm to approximate ET maps as a lim it of partial OT maps and also demonstrated its advantages on toy examples and the unpaired image-to-image translation task. Unlike its predecessors, the proposed method allows for greater preservation of the properties of the input object during domain translation and provides mechanisms for ignoring outliers in the target dataset.
Translation handbags into shoes (left) and celebrities into anime characters (right) with the new algorithm
The project code is available on GitHub, and details of the research can be found in the article published in the proceedings of the NeurIPS 2023 conference.