This thesis presents the refactoring and extension of a prototype implementation of a data-driven solver selection workflow for sparse linear systems into a modular, reusable, and reproducible system. The original implementation, based on a Jupyter Notebook and Python scripts, is restructured to better support modular design, consistent data handling, and improved interoperability.
To achieve this, the workflow is aligned with the scikit-learn developer API and the Research Software Engineering and FAIR principles. A custom scikit-learn compatible estimator is implemented, and the system is organized into a pipeline-based design that enables consistent preprocessing, training, inference, and evaluation while facilitating extensibility and versitility.
The refactored implementation preserves the behavior of the original prototype while enhancing modularity, reproducibility, and interoperability. It further expands the system’s capabilities by enabling seamless integration with standard tools such as cross-validation, hyperparameter optimization, and model persistence via joblib.
Overall, this work demonstrates that applying structured software engineering practices and aligning with established machine learning frameworks can improve the usability, extensibility, and sustainability of research code without altering its underlying methodology.
«
This thesis presents the refactoring and extension of a prototype implementation of a data-driven solver selection workflow for sparse linear systems into a modular, reusable, and reproducible system. The original implementation, based on a Jupyter Notebook and Python scripts, is restructured to better support modular design, consistent data handling, and improved interoperability.
To achieve this, the workflow is aligned with the scikit-learn developer API and the Research Software Engineering...
»