Seannet: Semantic Understanding Network for Localization under Object Dynamics

ArXived:

Xiao Li, Yidong Du, Zhen Zeng, Chad Jenkins

We aim for domestic robots to perform long-term indoor service. Under the object-level scene dynamics induced by daily human activities, a robot needs to robustly localize itself in the environment subject to scene uncertainties. Previous works have addressed visual-based localization in static environments, yet the object-level scene dynamics challenge existing methods for the long-term deployment of the robot. This paper proposes a SEmantic understANding Network (SeanNet) architecture that enables an effective learning process with coupled visual and semantic inputs. With a dataset that contains object dynamics, we propose a cascaded contrastive learning scheme to train the SeanNet for learning a vector scene embedding. Subsequently, we can measure the similarity between the current observed scene and the target scene, whereby enables robust localization under object-level dynamics. In our experiments, we benchmark SeanNet against state-of-the-art image-encoding networks (baselines) on scene similarity measures. The SeanNet architecture with the proposed training method can achieve an 85.02% accuracy which is higher than baselines. We further integrate the SeanNet and the other networks as the localizers into a visual navigation application. We demonstrate that SeanNet achieves higher success rates compared to the baselines. [GithubArXiv]



Automated Coffee Making by a Domestic Robot