This paper evaluates the use of Deep Reinforcement Learning to control special-purpose automated Production Systems, which are characterized by multiple end-effectors that are actuated in only one or two axes. Due to the large number of actuators of which only a few affect the processing of a workpiece at a given time, these systems are challenging to learn. In this paper, Deep Q-Learning is applied to a
small use case focusing on sorting workpieces by color in a simulation of such a production system. The basic algorithm is hereby compared to four commonly used extensions: Double Q-learnings, Dueling Networks, Prioritized Experience Replay, and Hindsight Experience Replay. For the scope of this paper,
simplifications are applied to the state and action space. While the baseline implementation of Deep Q-learning is able to correctly sort 30 previously seen workpiece combinations, it does not reliably generalize to unseen ones within 35,000 training episodes. In contrast, the algorithm using all four considered extensions is able to generalize to 80 out of 81possible workpiece combinations.
«
This paper evaluates the use of Deep Reinforcement Learning to control special-purpose automated Production Systems, which are characterized by multiple end-effectors that are actuated in only one or two axes. Due to the large number of actuators of which only a few affect the processing of a workpiece at a given time, these systems are challenging to learn. In this paper, Deep Q-Learning is applied to a
small use case focusing on sorting workpieces by color in a simulation of such a production...
»