Benutzer: Gast  Login
Titel:

ReMem-VLA: Empowering Vision-Language-Action Model with Memory via Dual-Level Recurrent Queries

Dokumenttyp:
Zeitschriftenaufsatz
Autor(en):
Li, Hang; Shen, Fengyi; Chen, Dong; Yang, Liudi; Wang, Xudong; Shi, Jinkui; Bing, Zhenshan; Liu, Ziyuan; Knoll, Alois
Jahr:
2026
WWW:
https://arxiv.org/abs/2603.12942
Hinweise:
Vision-language-action (VLA) models for closed-loop robot control are typically cast under the Markov assumption, making them prone to errors on tasks requiring historical context. To incorporate memory, existing VLAs either retrieve from a memory bank, which can be misled by distractors, or extend the frame window, whose fixed horizon still limits long-term retention. In this paper, we introduce ReMem-VLA, a Recurrent Memory VLA model equipped with two sets of learnable queries: frame-level ...
 BibTeX