Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation

Hang Li, Qian Feng, Zhi Zheng, Jianxiang Feng, Alois Knoll

doi:https://doi.org/10.48550/arXiv.2407.00451

Dokumenttyp:: Preprint
Autor(en):: Hang Li, Qian Feng, Zhi Zheng, Jianxiang Feng, Alois Knoll
Titel:: Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation
Abstract:: Learning from demonstrations faces challenges in generalizing beyond the training data and is fragile even to slight visual variations. To tackle this problem, we introduce Lan-o3dp, a language guided object centric diffusion policy that takes 3d representation of task relevant objects as conditional input and can be guided by cost function for safety constraints at inference time. Lan-o3dp enables strong generalization in various aspects, such as background changes, visual ambiguity and can avoid novel obstacles that are unseen during the demonstration process. Specifically, We first train a diffusion policy conditioned on point clouds of target objects and then harness a large language model to decompose the user instruction into task related units consisting of target objects and obstacles, which can be used as visual observation for the policy network or converted to a cost function, guiding the generation of trajectory towards collision free region at test time. Our proposed method shows training efficiency and higher success rates compared with the baselines in simulation experiments. In real world experiments, our method exhibits strong generalization performance towards unseen instances, cluttered scenes, scenes of multiple similar objects and demonstrates training free capability of obstacle avoidance. «
Learning from demonstrations faces challenges in generalizing beyond the training data and is fragile even to slight visual variations. To tackle this problem, we introduce Lan-o3dp, a language guided object centric diffusion policy that takes 3d representation of task relevant objects as conditional input and can be guided by cost function for safety constraints at inference time. Lan-o3dp enables strong generalization in various aspects, such as background changes, visual ambiguity and can avo... »
Zeitschriftentitel:: arXiv
Jahr:: 2024
Volltext / DOI:: doi: https://doi.org/10.48550/arXiv.2407.00451
WWW:: https://arxiv.org/pdf/2407.00451
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Engineering Informatik 6 - Lehrstuhl für Robotik, Künstliche Intelligenz und Echtzeitsysteme (Prof. Knoll)2024