Efficient permutation testing of variable importance measures by the   example of random forests

Hapfelmeier, Alexander; Hornung, Roman; Haller, Bernhard

doi:10.1016/j.csda.2022.107689

Benutzer: Gast

Institut für Allgemeinmedizin (keine SAP-Zuordnung!)

Zurück
Zurück zum Anfang der Trefferliste
Dauerhafter Link zum angezeigten Objekt

Titel:: Efficient permutation testing of variable importance measures by the example of random forests
Dokumenttyp:: Article
Autor(en):: Hapfelmeier, Alexander; Hornung, Roman; Haller, Bernhard
Abstract:: Hypothesis testing of variable importance measures (VIMPs) is still the subject of ongoing research. This particularly applies to random forests (RF), for which VIMPs are a popular feature. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. But these approaches can be computationally expensive or even practically infeasible. This problem also occurs with non-parametric permutation tests, which are, however, distribution-free and can generically be applied to any kind of prediction model and VIMP. Embracing this advantage, it is proposed to use sequential permutation tests and sequential p-value estimation to reduce the computational costs associated with conventional permutation tests. These costs can be particularly high in case of complex prediction models. Therefore, RF's popular and widely used permutation VIMP (pVIMP) serves as a practical and relevant application example. The results of simulation studies confirm the theoretical properties of the sequential tests, that is, the type-I error probability is controlled at a nominal level and a high power is maintained with considerably fewer permutations needed compared to conventional permutation testing. The numerical stability of the methods is investigated in two additional application studies. In summary, theoretically sound sequential permutation testing of VIMP is possible at greatly reduced computational costs. Recommendations for application are given. A respective implementation for RF's pVIMP is provided through the accompanying R package rfvimptest. (c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). «
Hypothesis testing of variable importance measures (VIMPs) is still the subject of ongoing research. This particularly applies to random forests (RF), for which VIMPs are a popular feature. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. But these approaches can be computationally expensive or even practically infeasi... »
Zeitschriftentitel:: Comput Stat Data Anal
Jahr:: 2023
Band / Volume:: 181
Volltext / DOI:: doi:10.1016/j.csda.2022.107689
Print-ISSN:: 0167-9473
TUM Einrichtung:: Lehrstuhl für Allgemeinmedizin (Prof. Schneider); Lehrstuhl für Medizinische Informatik (Prof. Boeker)
BibTeX