Reinforcement Learning is the most commonly used class of learning
algorithms which lets robots or other systems autonomously learn
their behaviour. Learning is enabled solely through interaction with the
environment. Today’s learning systems are often confronted with high
dimensional and continuous problems. To solve those, so-called Policy
Gradient methods are used more and more often.
The PGPE algorithm developed in this thesis, a new type of Policy
Gradient algorithm, allows model-free learning in complex, continuous,
partially observable and high dimensional environments. We show that
tasks like grasping of glasses and plates with an human-like arm can
be learned with this method without prior knowledge, solely with pure
model-free reinforcement learning in a simulation environment. Also,
the balancing of a humanoid robot perturbed by external forces, as
well as dynamic walking behaviour of a mass-spring system could be
learned. In all experiments, PGPE learned the given tasks more efficiently
than well-established methods. In addition, the use of PGPE is
not restricted to robotics. Among several investigated methods, it was
the most successful in cracking non-differentiable physical cryptography
systems. PGPE is suitable for training multidimensional recurrent
neural networks to play Go, or for fine-tuning deep neural nets for
computer vision.
In the scope of this thesis, the principles used, the advantages and
disadvantages as well as the differences with regard to well-established
methods are derived and analysed in detail.
«
Reinforcement Learning is the most commonly used class of learning
algorithms which lets robots or other systems autonomously learn
their behaviour. Learning is enabled solely through interaction with the
environment. Today’s learning systems are often confronted with high
dimensional and continuous problems. To solve those, so-called Policy
Gradient methods are used more and more often.
The PGPE algorithm developed in this thesis, a new type of Policy
Gradient algorithm, allows model-fre...
»