Benchmark for Building Segmentation on Up-Scaled Sentinel-2 Imagery
Abstract
Currently, we can solve a wide range of tasks using computer vision algorithms, which reduce manual labor and enable rapid analysis of the environment. The remote sensing domain provides vast amounts of satellite data, but it also poses challenges associated with processing this data. Baseline solutions with intermediate results are available for various tasks, such as forest species classification, infrastructure recognition, and emergency situation analysis using satellite data. Despite these advances, two major issues with high-performing artificial intelligence algorithms remain in the current decade. The first issue relates to the availability of data. To train a robust algorithm, a reasonable amount of well-annotated training data is required. The second issue is the availability of satellite data, which is another concern. Even though there are a number of data providers, high-resolution and up-to-date imagery is extremely expensive. This paper aims to address these challenges by proposing an effective pipeline for building segmentation that utilizes freely available Sentinel-2 data with 10 m spatial resolution. The approach we use combines a super-resolution (SR) component with a semantic segmentation component. As a result, we simultaneously consider and analyze SR and building segmentation tasks to improve the quality of the infrastructure analysis through medium-resolution satellite data. Additionally, we collected and made available a unique dataset for the Russian Federation covering area of 1091.2 square kilometers. The dataset provides Sentinel-2 imagery adjusted to the spatial resolution of 2.5 m and is accompanied by semantic segmentation masks. The building footprints were created using OpenStreetMap data that was manually checked and verified. Several experiments were conducted for the SR task, using advanced image SR methods such as the diffusion-based SR3 model, RCAN, SRGAN, and MCGR. The MCGR network produced the best result, with a PSNR of 27.54 and SSIM of 0.79. The obtained SR images were then used to tackle the building segmentation task with different neural network models, including DeepLabV3 with different encoders, SWIN, and Twins transformers. The SWIN transformer achieved the best results, with an F1-score of 79.60.
Similar publications
partnership