Source
ICONIP
YEAR OF PUBLICATION
2023
Authors
Dmitry Yudin Aleksandr Khorin Tatiana Zemskova Darya Ovchinnikova
Share

TASFormer: Task-Aware Image Segmentation Transformer

Abstract

In image segmentation tasks for real-world applications, the number of semantic categories can be very large, and the number of objects in them can vary greatly. In this case, the multi-channel representation of the output mask for the segmentation model is inefficient. In this paper we explore approaches to overcome such a problem by using a single-channel output mask and additional input information about the desired class for segmentation. We call this information task embedding and we learn it in the process of the neural network model training. In our case, the number of tasks is equal to the number of segmentation categories. This approach allows us to build universal models that can be conveniently extended to an arbitrary number of categories without changing the architecture of the neural network. To investigate this idea we developed a transformer neural network segmentation model named TASFormer. We demonstrated that the highest quality results for task-aware segmentation are obtained using adapter technology as part of the model. To evaluate the quality of segmentation, we introduce a binary intersection over union (bIoU) metric, which is an adaptation of the standard mIoU for the models with a single-channel output. We analyze its distinguishing properties and use it to compare modern neural network methods. The experiments were carried out on the universal ADE20K dataset. The proposed TASFormer-based approach demonstrated state-of-the-art segmentation quality on it. The software implementation of the TASFormer method and the bIoU metric is publicly available at www.github.com/subake/TASFormer.

Join AIRI