In this contribution, we develop a feedback controller for a wheeled inverted pendulum in the form of a neural
network that is not only stabilizing the unstable system, but also
allows the wheeled robot to drive to arbitrary positions within
a certain radius and take a desired orientation, without the
need to compute a feasible trajectory to the desired position
online. While some techniques from the reinforcement learning
community can be used to optimize the parameters of a general
feedback controller, i.e. policy gradient methods, the method
used in this work is an approach related to imitation learning or
learning from demonstration. The demonstration data however
does not result from e.g. a human demonstrator, but is a set
of precomputed optimal trajectories. The neural network is
trained to imitate the behavior of those optimal trajectories. We
show that a good choice of initial states and a large number of
training targets can be used to alleviate a problem of imitation
learning, namely deviating from training trajectories, and we
demonstrate results in simulation as well as on the physical
system.
«
In this contribution, we develop a feedback controller for a wheeled inverted pendulum in the form of a neural
network that is not only stabilizing the unstable system, but also
allows the wheeled robot to drive to arbitrary positions within
a certain radius and take a desired orientation, without the
need to compute a feasible trajectory to the desired position
online. While some techniques from the reinforcement learning
community can be used to optimize the parameters of a general
feed...
»