3D Object Detection with a Self-supervised Lidar Scene Flow Backbone

Erçelik, Emeç; Yurtsever, Ekim; Liu, Mingyu; Yang, Zhijie; Zhang, Hanzhen; Topçam, Pınar; Listl, Maximilian; Kaan Çaylı, Yılmaz; Knoll, Alois

doi:https://doi.org/10.1007/978-3-031-20080-9_15

Wenn Sie Schwierigkeiten haben, das Dokument zu öffnen, versuchen Sie auch bitte diesen Link

Titel:: 3D Object Detection with a Self-supervised Lidar Scene Flow Backbone
Dokumenttyp:: Konferenzbeitrag
Autor(en):: Erçelik, Emeç; Yurtsever, Ekim; Liu, Mingyu; Yang, Zhijie; Zhang, Hanzhen; Topçam, Pınar; Listl, Maximilian; Kaan Çaylı, Yılmaz; Knoll, Alois
Abstract:: State-of-the-art 3D detection methods rely on supervised learning and large labelled datasets. However, annotating lidar data is resource-consuming, and depending only on supervised learning limits the applicability of trained models. Against this backdrop, here we propose using a self-supervised training strategy to learn a general point cloud backbone model for downstream 3D vision tasks. 3D scene flow can be estimated with self-supervised learning using cycle consistency, which removes labelled data requirements. Moreover, the perception of objects in the traffic scenarios heavily relies on making sense of the sparse data in the spatio-temporal context. Our main contribution leverages learned flow and motion representations and combines a self-supervised backbone with a 3D detection head focusing mainly on the relation between the scene flow and detection tasks. In this way, self-supervised scene flow training constructs point motion features in the backbone, which help distinguish objects based on their different motion patterns used with a 3D detection head. Experiments on KITTI and nuScenes benchmarks show that the proposed self-supervised pre-training increases 3D detection performance significantly. «
State-of-the-art 3D detection methods rely on supervised learning and large labelled datasets. However, annotating lidar data is resource-consuming, and depending only on supervised learning limits the applicability of trained models. Against this backdrop, here we propose using a self-supervised training strategy to learn a general point cloud backbone model for downstream 3D vision tasks. 3D scene flow can be estimated with self-supervised learning using cycle consistency, which removes labell... »
Kongress- / Buchtitel:: European conference on computer vision (ECCV)
Jahr:: 2022
Volltext / DOI:: doi:https://doi.org/10.1007/978-3-031-20080-9_15
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Hochschulbibliographie 2022 Schools und Fakultäten Informatik Informatik 6 - Lehrstuhl für Robotik, Künstliche Intelligenz und Echtzeitsysteme (Prof. Knoll)

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Engineering Informatik 6 - Lehrstuhl für Robotik, Künstliche Intelligenz und Echtzeitsysteme (Prof. Knoll)2022