Multimodal 3D map reconstruction for intelligent robotcs using neural network-based methods
Abstract
Methods for constructing multimodal 3D maps are becoming increasingly important for robot navigation systems. In such maps, each 3D point or object contains, in addition to color and semantic category information, compressed vector representations of a text description or sound. This allows solving problems of moving to objects based on natural language queries, even those that do not explicitly mention the object. This article proposes an original taxonomy of methods that allow constructing multimodal 3D maps using neural network methods. It is shown that sparse methods that use a scene representation in the form of an object graph and large language models to find an answer to spatial and semantic queries demonstrate the most promising results on existing open benchmarks. Based on the analysis, recommendations are formulated for choosing certain methods for solving practical problems of intelligent robotics.
Similar publications
partnership