In this theoretical report, we examine the integration of Reinforcement Learning (RL), particularly the
Proximal Policy Optimization (PPO) algorithm, within the context of flow shop scheduling. Our investi-
gation revolves around enhancing scheduling policies via priority functions with state-dependent weights
to address the dynamic challenges of modern production environments.
Our research highlights the effectiveness of adaptive priority functions in structured settings with mul-
tiple machines and stages. However, we also uncover that in larger action spaces with less structure,
heuristic methods seem to be more suitable. A pivotal finding is the significant variation in the PPO
algorithm’s performance across different configurations, with simpler feature sets leading to better out-
comes. This variation suggests that optimising these features and simplifying action spaces can enhance
computational efficiency without sacrificing results. In addition to the earlier findings, our experiments
with continuous action spaces revealed some challenges. The PPO algorithm, when applied to these
larger and less structured data-sets, did not outperform its discrete counterparts and, in fact, seemed to
struggle with learning efficiency.
Ultimately, our study highlights the necessity of a simplified approach to feature selection with the PPO
algorithm’s application to flow shop scheduling, to ensure effective learning and operational efficiency.
«
In this theoretical report, we examine the integration of Reinforcement Learning (RL), particularly the
Proximal Policy Optimization (PPO) algorithm, within the context of flow shop scheduling. Our investi-
gation revolves around enhancing scheduling policies via priority functions with state-dependent weights
to address the dynamic challenges of modern production environments.
Our research highlights the effectiveness of adaptive priority functions in structured settings with mul-
tiple mac...
»