The success of deep reinforcement learning approaches to learn dexterous manipulation skills strongly hinges on the rewards assigned to actions during task execution. The usual approach is to handcraft the reward function but due to the high complexity of dexterous manipulations the reward definition demands large engineering effort for each particular task. To avoid this burden, we use an inverse reinforcement learning (IRL) approach to automatically learn the reward function using samples obtained from demonstrations of desired behaviours. We have identified that the learned rewards using existing IRL approaches are strongly biased towards demonstrated actions due to the scarcity of samples in the vast state-action space of dexterous manipulation applications. This significantly hinders performance due to unreliable reward estimations in regions unexplored during demonstration. We use statistical tools for random sample generation and reward normalization to reduce this bias. We show that this approach improves learning stability and transferability of IRL for dexterous manipulation tasks.
«
The success of deep reinforcement learning approaches to learn dexterous manipulation skills strongly hinges on the rewards assigned to actions during task execution. The usual approach is to handcraft the reward function but due to the high complexity of dexterous manipulations the reward definition demands large engineering effort for each particular task. To avoid this burden, we use an inverse reinforcement learning (IRL) approach to automatically learn the reward function using samples obta...
»