Inpainting Semantic and Depth Features to Improve Visual Place Recognition in the Wild
Visual place recognition is one of the core modern computer vision tasks concerned with identifying location based on the image taken there. Modern state-of-the-art approaches heavily rely on RGB images which are largely affected by changes in the same scene such as varying daytime, illumination, seasonal changes, and presence of dynamic objects (people, vehicles). This results into a large difference between the images in the training dataset and the ones taken by a person in real life at the same place as a part of some application, rendering modern approaches less effective. To deal with this problem, we propose a novel approach that uses only geometrical information (shapes of buildings, terrains, trees, and their relevant positions) obtained from depth and semantic maps inpainted to remove dynamic objects. In this paper, we study two versions of the pipeline: the first one uses direct inpainting, and the second utilizes synthetic data to improve the inpainting process. Our most efficient model achieved 60.6% correct answers with synthetic refinement. With direct inpainting, it kept metrics high at 51.1%. With these compelling results, our approach offers a novel and effective alternative to known algorithms, making it an exciting avenue for future research in visual place recognition.