Multi-modal RGBD Attention Fusion for Dense Depth Estimation
With the development of autonomous vehicles and augmented reality devices, LiDARs and cameras are becoming the main tools for object recognition. However, fusing information from multiple data sources is a challenging task in computer vision. One of the most promising directions of fusing is self-supervised training. However, previous works in this field focused only on simple mechanisms for fusion. In this paper, a novel model architecture with improved fusion blocks with attention mechanisms is presented. A comparison of the impact of input modalities and loss functions on the model is also provided. Experiments demonstrate the ability of presented fusion block in the LiDAR and camera data fusing. The proposed Neural network architecture and learning framework show promising results in depth completion tasks.