Refactoring and Modularization of a Machine Learning Pipeline for Solver Selection

Fernández Peralta, Nino Joel

Wenn Sie Schwierigkeiten haben, das Dokument zu öffnen, versuchen Sie auch bitte diesen Link

Dokumenttyp:: Bachelorarbeit
Autor(en):: Fernández Peralta, Nino Joel
Titel:: Refactoring and Modularization of a Machine Learning Pipeline for Solver Selection
Abstract:: This thesis presents the refactoring and extension of a prototype implementation of a data-driven solver selection workflow for sparse linear systems into a modular, reusable, and reproducible system. The original implementation, based on a Jupyter Notebook and Python scripts, is restructured to better support modular design, consistent data handling, and improved interoperability. To achieve this, the workflow is aligned with the scikit-learn developer API and the Research Software Engineering and FAIR principles. A custom scikit-learn compatible estimator is implemented, and the system is organized into a pipeline-based design that enables consistent preprocessing, training, inference, and evaluation while facilitating extensibility and versitility. The refactored implementation preserves the behavior of the original prototype while enhancing modularity, reproducibility, and interoperability. It further expands the system’s capabilities by enabling seamless integration with standard tools such as cross-validation, hyperparameter optimization, and model persistence via joblib. Overall, this work demonstrates that applying structured software engineering practices and aligning with established machine learning frameworks can improve the usability, extensibility, and sustainability of research code without altering its underlying methodology. «
This thesis presents the refactoring and extension of a prototype implementation of a data-driven solver selection workflow for sparse linear systems into a modular, reusable, and reproducible system. The original implementation, based on a Jupyter Notebook and Python scripts, is restructured to better support modular design, consistent data handling, and improved interoperability. To achieve this, the workflow is aligned with the scikit-learn developer API and the Research Software Engineering... »
Aufgabensteller:: Bungartz, Hans-Joachim
Betreuer:: Liu Weng, Hayden
Jahr:: 2026
Quartal:: 1. Quartal
Jahr / Monat:: 2026-03
Monat:: Mar
Sprache:: en
Hochschule / Universität:: Technical University of Munich
Fakultät:: TUM School of Computation, Information and Technology
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Science Informatik 5 - Lehrstuhl für Scientific Computing (Prof. Bungartz)2026