Evaluation of Safe Policy Improvement with Soft Baseline Bootstrapping

Philipp Scholl

If you experience problems opening the document, please try this link.

Document type:: Masterarbeit
Author(s):: Philipp Scholl
Title:: Evaluation of Safe Policy Improvement with Soft Baseline Bootstrapping
Abstract:: Due to the high computing power of modern computers and the increasing availability of data, reinforcement learning is gaining importance in many areas of industry. In safety critical areas, however, reinforcement learning poses dangers, which is why algorithms that promise a safe improvement of the behavior policy are of high interest. Therefore, this master's thesis examines the Soft-SPIBB algorithms introduced in "Safe Policy Improvement with Soft Baseline Bootstrapping" by Nadjahi et al. Some shortcomings of the mathematical theory underlying the algorithms are revealed, which have the consequence that the theoretical safety bounds can no longer be applied to the original algorithms from the paper. However, this can be remedied by further restricting the algorithms and this leads to new algorithms to which the theoretical safeties apply. In addition to adapting further algorithms that incorporate the uncertainty of state-action pairs into their calculations, and implementing all of these into a unified Python framework, these algorithms are tested on two different benchmarks. Here, in particular, a heuristic adaptation of the original algorithms turns out to be both very safe and performant. «
Due to the high computing power of modern computers and the increasing availability of data, reinforcement learning is gaining importance in many areas of industry. In safety critical areas, however, reinforcement learning poses dangers, which is why algorithms that promise a safe improvement of the behavior policy are of high interest. Therefore, this master's thesis examines the Soft-SPIBB algorithms introduced in "Safe Policy Improvement with Soft Baseline Bootstrapping" by Nadjahi et al. Som... »
Supervisor:: Hans-Joachim Bungartz
Advisor:: Felix Dietrich; Clemens Otte; Steffen Udluft
Year:: 2021
Quarter:: 2. Quartal
Year / month:: 2021-04
Month:: Apr
Language:: en
University:: Technical University of Munich
Faculty:: Fakultät für Mathematik
BibTeX

Occurrences:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Science Informatik 5 - Lehrstuhl für Scientific Computing (Prof. Bungartz)2021