ReMem-VLA: Empowering Vision-Language-Action Model with Memory via Dual-Level Recurrent Queries

Titel:: ReMem-VLA: Empowering Vision-Language-Action Model with Memory via Dual-Level Recurrent Queries
Dokumenttyp:: Zeitschriftenaufsatz
Autor(en):: Li, Hang; Shen, Fengyi; Chen, Dong; Yang, Liudi; Wang, Xudong; Shi, Jinkui; Bing, Zhenshan; Liu, Ziyuan; Knoll, Alois
Jahr:: 2026
WWW:: https://arxiv.org/abs/2603.12942
Hinweise:: Vision-language-action (VLA) models for closed-loop robot control are typically cast under the Markov assumption, making them prone to errors on tasks requiring historical context. To incorporate memory, existing VLAs either retrieve from a memory bank, which can be misled by distractors, or extend the frame window, whose fixed horizon still limits long-term retention. In this paper, we introduce ReMem-VLA, a Recurrent Memory VLA model equipped with two sets of learnable queries: frame-level ...
BibTeX

Vorkommen: