Deep Reinforcement Learning in Production Scheduling

Boe Krogh, H.

Benutzer: Gast

Dokumenttyp:: Masterarbeit
Autor(en):: Boe Krogh, H.
Titel:: Deep Reinforcement Learning in Production Scheduling
Abstract:: The thesis investigates deep reinforcement learning, specifically Proximal Policy Optimization, on online scheduling to minimise the schedule’s makespan. Invalid action masking was gradually applied with potential-based and auxiliary reward shaping to assess their individual and com- bined effect on the performance of the deep reinforcement learning algorithm. Hyperparameter tuning using Bayesian Optimization and Hyperband was implemented to find the optimal con- figuration of hyperparameters, including the neural network architecture. The PPO algorithm was trained and tested on three flow shop production layouts with two, four, and eight stages. The study found that invalid action masking was necessary for the PPO algorithm to solve the scheduling problem. Furthermore, potential-based and auxiliary rewards shaping was found to improve the performance of PPO, resulting in the algorithm outperforming the shortest pro- cessing time heuristic for the two and four-stage layouts. However, the advantage of using deep reinforcement learning was only slight. One may argue that the SPT heuristic would be used in a practical setting as it is easier to implement, more transparent in its decisions, and does not require training. In the eight-stage production layout, the PPO algorithm did not perform better than the shortest processing time heuristic with neither invalid action masking or reward shaping «
The thesis investigates deep reinforcement learning, specifically Proximal Policy Optimization, on online scheduling to minimise the schedule’s makespan. Invalid action masking was gradually applied with potential-based and auxiliary reward shaping to assess their individual and com- bined effect on the performance of the deep reinforcement learning algorithm. Hyperparameter tuning using Bayesian Optimization and Hyperband was implemented to find the optimal con- figuration of hyperparamete... »
Betreuer:: Doerr, J.
Gutachter:: Grunow, M.
Jahr:: 2023
Hochschule / Universität:: Technische Universität München
Fakultät:: TUM School of Management
TUM Einrichtung:: Chair of Production and Supply Chain Management
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Management Departments Operations and Technology Lehrstuhl für Produktion und Supply Chain Management (Prof. Grunow)Master Theses