Accelerated Gradient Temporal Difference Learning Algorithms

Meyer, Dominik; Degenne, Remy; Omrane, Ahmed; Shen, Hao

doi:http://dx.doi.org/10.1109/ADPRL.2014.7010611

adprl_14.pdf

Wenn Sie Schwierigkeiten haben, das Dokument zu öffnen, versuchen Sie auch bitte diesen Link

Titel:: Accelerated Gradient Temporal Difference Learning Algorithms
Dokumenttyp:: Konferenzbeitrag
Art des Konferenzbeitrags:: Textbeitrag / Aufsatz
Autor(en):: Meyer, Dominik; Degenne, Remy; Omrane, Ahmed; Shen, Hao
Abstract:: In this paper we study Temporal Difference (TD) Learning with linear value function approximation. The classic TD algorithm is known to be unstable with linear function approximation and off-policy learning. Recently developed Gradient TD (GTD) algorithms have addressed this problem successfully. Despite their prominent properties of good scalability and convergence to correct solutions, they inherit the potential weakness of slow convergence as they are a stochastic gradient descent algorithm. Accelerated stochastic gradient descent algorithms have been developed to speed up convergence, while still keeping computational complexity low. In this work, we develop an accelerated stochastic gradient descent method for minimizing the Mean Squared Projected Bellman Error (MSPBE), and derive a bound for the Lipschitz constant of the gradient of the MSPBE, which plays a critical role in our proposed accelerated GTD algo-rithms. Our comprehensive numerical experiments demonstrate promising performance in solving the policy evaluation problem, in comparison to the GTD]algorithm family. In particular, accelerated TDC surpasses state-of-the-art algorithms. «
In this paper we study Temporal Difference (TD) Learning with linear value function approximation. The classic TD algorithm is known to be unstable with linear function approximation and off-policy learning. Recently developed Gradient TD (GTD) algorithms have addressed this problem successfully. Despite their prominent properties of good scalability and convergence to correct solutions, they inherit the potential weakness of slow convergence as they are a stochastic gradient descent algorithm.... »
Kongress- / Buchtitel:: Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on
Datum der Konferenz:: 9-12 Dec. 2014
Jahr:: 2014
Quartal:: 4. Quartal
Jahr / Monat:: 2014-12
Monat:: Dec
Seiten:: 8
Reviewed:: ja
Sprache:: en
Erscheinungsform:: WWW
Volltext / DOI:: doi:http://dx.doi.org/10.1109/ADPRL.2014.7010611
WWW:: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7010611
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Hochschulbibliographie 2014 Fakultäten Elektrotechnik und Informationstechnik Lehrstuhl für Datenverarbeitung (Prof. Diepold)

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Engineering Datenverarbeitung (Prof. Diepold)2014