Vector Symbolic Scene Representation for Semantic Place Recognition
Most state-of-the-art methods do not explicitly use scene semantics for the place recognition by images. We address this problem and propose a new two-stage approach named as TSVLoc. It solves place recognition task as the image retrieval problem and enriches any well-known method. In the first model agnostic stage, any modern neural network model that does not directly use semantics, e.g., HF-Net, NetVLAD, PatchNetVLAD, can be used. In the second stage, we apply the Vector Symbolic Architectures (VSA) framework to construct semantic scene representation. Our method uses semantic segmentation of an image to extract objects and their relations and applies VSA operations to form semantic scene representation. For this, an optional usage of the depth map was considered, which showed promising results. We demonstrate the effectiveness of our approach through extensive experiments on the open largescale datasets: the indoor HPointLoc dataset built in the Habitat simulation environment and the outdoor Oxford RobotCar dataset. Proposed solution significantly improves the quality of the place recognition.