This paper presents a study on the optimization of a biopharmaceutical production process at the interface between the upstream process (USP) and downstream process (DSP). The process is originally modeled as a Markov decision process (MDP) with the goal of finding the optimal policy for maximizing its profit. This model is extended to a partiallyobserved Markov decision process (POMDP) to take into account the uncertainty of the resin capacity and latent characteristics of the product batch. A new restriction to plan operations multiple days in advance is also introduced. The base MDP model can be efficiently solved with value iteration (VI), yielding a minor improvement over the current industry standard. For the extended POMDP model, we develop procedures based on VI that estimate unknown variables via Markov filtering. We derive a way to calculate the optimal policy under the restriction that operations must be scheduled multiple days in dvance that can be easily integrated into VI. All policies can be compared to a hypothetical optimal method with access to hidden variables whose performance can serve as an upper bound to any policy. For the process parameters we
analyzed, the solution based on VI yields 1.4% less profit compared to the optimum. A solution using deep reinforcement learning (DRL) was found to perform worse than the aforementioned methods, yielding a 12.6% profit loss.
«