causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery

Göbler, Konstantin; Windisch, Tobias; Drton, Mathias; Pychynski, Tim; Roth, Martin; Sonntag, Steffen

Benutzer: Gast

Titel:: causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery
Dokumenttyp:: Konferenzbeitrag
Art des Konferenzbeitrags:: Vortrag / Präsentation
Autor(en):: Göbler, Konstantin; Windisch, Tobias; Drton, Mathias; Pychynski, Tim; Roth, Martin; Sonntag, Steffen
Seitenangaben Beitrag:: 609--642
Abstract:: Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real and complex data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To tackle these challenges, we introduce causalAssembly, a semisynthetic data generator designed to facilitate the benchmarking of causal discovery methods. The tool is built using a complex real-world dataset comprised of measurements collected along an assembly line in a manufacturing setting. For these measurements, we establish a partial set of ground truth causal relationships through a detailed study of the physics underlying the processes carried out in the assembly line. The partial ground truth is sufficiently informative to allow for estimation of a full causal graph by mere nonparametric regression. To overcome potential confounding and privacy concerns, we use distributional random forests to estimate and represent conditional distributions implied by the ground truth causal graph. These conditionals are combined into a joint distribution that strictly adheres to a causal model over the observed variables. Sampling from this distribution, causalAssembly generates data that are guaranteed to be Markovian with respect to the ground truth. Using our tool, we showcase how to benchmark several well-known causal discovery algorithms. «
Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real and complex data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To tackle these challenges... »
Stichworte:: Causal discovery, benchmarking, production data, distributional random forest
Dewey-Dezimalklassifikation:: 510 Mathematik
Herausgeber:: MLResearchPress
Kongress- / Buchtitel:: Proceedings of Machine Learning Research
Kongress / Zusatzinformationen:: Third Conference on Causal Learning and Reasoning
Band / Teilband / Volume:: 236
Datum der Konferenz:: April 1-3, 2024
Publikationsdatum:: 19.03.2024
Jahr:: 2024
Quartal:: 1. Quartal
Jahr / Monat:: 2024-03
Monat:: Mar
E-ISBN:: 2640-3498
Sprache:: en
Erscheinungsform:: WWW
WWW:: Proceedings of Machine Learning Research
Semester:: WS 23-24
TUM Einrichtung:: Lehrstuhl für Mathematische Statistik
Format:: Text
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Hochschulbibliographie 2024 Schools und Fakultäten TUM School of Computation, Information and Technology Lehrstuhl für Mathematische Statistik (Prof. Drton)

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Mathematics Arbeitsgruppe Mathematische Statistik Lehrstuhl für Mathematische Statistik (Prof. Drton)Arbeitsgruppe Prof. Drton Publications