Within the scope of highly-automated driving, an important part is the automated control of vehicle dynamics for path-following. For that, we examine two approaches. The first is Model Predictive Control (MPC). It takes hard constraints into consideration but remains challenging regarding its parameters. The second is Reinforcement Learning (RL). This approach promises high performance, lacks however safety-guarantees. To overcome the drawbacks of both approaches, we propose to combine both frameworks to achieve safety and performance at the same time and to save the hand-tuning workload. To compensate for changes in the MPC environment, we define a novel Parameter-Varying MPC (PMPC) driven by a deep neural network. The latter is trained with state-of-the-art RL algorithms. Thus, we have a cascaded control system: PMPC determines the vehicle control actions, while RL introduces it with the best cost function parameter set. Apart from the objectives formulated in the MPC cost function, our RL driven PMPC is able to consider additional optimisation objectives formulated in a novel general multi-objective cascaded Gaussian (MOCG) reward function. We formulate optimisation objectives for a tracking- and for a comfort mode. Experimental results demonstrate that our approach trains autonomously and outperforms an MPC tuned by a human expert. The approach we propose is generic and, thus, can also be applied to other control areas, such as nanoelectronic semiconductor production, autonomous robotic medical surgeries or autonomous guidance of spacecraft.
«
Within the scope of highly-automated driving, an important part is the automated control of vehicle dynamics for path-following. For that, we examine two approaches. The first is Model Predictive Control (MPC). It takes hard constraints into consideration but remains challenging regarding its parameters. The second is Reinforcement Learning (RL). This approach promises high performance, lacks however safety-guarantees. To overcome the drawbacks of both approaches, we propose to combine both fram...
»