Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection

Mohamed, Sondos; Zimmer, Walter; Greer, Ross; Alaaeldin Ghita, Ahmed; Castrillón-Santana, Modesto; Trivedi, Mohan M.; Knoll, Alois C.; Carta, Salvatore Mario; Marras, Mirko

ahmed2024transfer

Wenn Sie Schwierigkeiten haben, das Dokument zu öffnen, versuchen Sie auch bitte diesen Link

Titel:: Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection
Dokumenttyp:: Konferenzbeitrag
Art des Konferenzbeitrags:: Textbeitrag / Aufsatz
Autor(en):: Mohamed, Sondos; Zimmer, Walter; Greer, Ross; Alaaeldin Ghita, Ahmed; Castrillón-Santana, Modesto; Trivedi, Mohan M.; Knoll, Alois C.; Carta, Salvatore Mario; Marras, Mirko
Seitenangaben Beitrag:: 18
Abstract:: Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for robust feature learning. Subsequently, we fine-tune the model on a combination of real-world datasets to enhance its adaptability to practical conditions. Experimental results of the Cube R-CNN model on challenging public benchmarks show a remarkable improvement in detection performance, with a mean average precision rising from 0.26 to 12.76 on the TUM Traffic A9 Highway dataset and from 2.09 to 6.60 on the DAIR-V2X-I dataset, when performing transfer learning. Code, data, and qualitative video results are available on the project website: https://roadsense3d.github.io. «
Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for robust feature learning. Subsequently, we fine-tune the model on a combination of real-world datase... »
Stichworte:: Roadside Perception, Autonomous Driving, Dataset, Monocular 3D Perception, 3D Object Detection, Synthetic Dataset, Transfer Learning
Dewey-Dezimalklassifikation:: 000 Informatik, Wissen, Systeme
Herausgeber:: ECVA
Kongress- / Buchtitel:: Proceedings of the 18th European Conference on Computer Vision ECCV 2024
Ausrichter der Konferenz:: Springer
Datum der Konferenz:: 30 September 2024
Verlag / Institution:: Springer-Verlag
Publikationsdatum:: 30.09.2024
Jahr:: 2024
Quartal:: 4. Quartal
Jahr / Monat:: 2024-09
Monat:: Sep
Seiten:: 19
Nachgewiesen in:: Scopus; Web of Science
Reviewed:: ja
Sprache:: en
Erscheinungsform:: WWW
TUM Einrichtung:: TUM School of Computation, Information and Technology
Format:: Text
CC-Lizenz:: by, http://creativecommons.org/licenses/by/4.0

BibTeX

Attachment-Browser öffnen...

Vorkommen:

mediaTUM Gesamtbestand Hochschulbibliographie 2024 Schools und Fakultäten TUM School of Computation, Information and Technology Informatik 6 - Lehrstuhl für Robotik, Künstliche Intelligenz und Echtzeitsysteme (Prof. Knoll)

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Engineering Informatik 6 - Lehrstuhl für Robotik, Künstliche Intelligenz und Echtzeitsysteme (Prof. Knoll)2024