Regression testing aims to determine whether a change to a system introduces new bugs or can be merged safely. Flaky tests, which are tests that fail non-deterministically and unrelated to the change, can undermine this effort. In research and practice alike, it is often assumed that limited test execution resources can lead to test flakiness. We hypothesize that hitting resource limits during test execution can lead to changed timing behavior, which in turn promotes timing-related flakiness. However, there is no empirical evidence indicating whether hitting these resource limits increases the likelihood of test flakiness. To shed light on this, we created a dataset of 20 open-source projects for macOS, which contains a total of 232 UI test cases, with 23 of them being flaky. For all tests, we measured the CPU usage continuously during test execution. We discovered that executions of flaky tests spend significantly more time at the CPU limit than executions of non-flaky tests. Contrary to our expectations, we found that failing runs of flaky tests spend less time at the CPU limit than passing runs.
«
Regression testing aims to determine whether a change to a system introduces new bugs or can be merged safely. Flaky tests, which are tests that fail non-deterministically and unrelated to the change, can undermine this effort. In research and practice alike, it is often assumed that limited test execution resources can lead to test flakiness. We hypothesize that hitting resource limits during test execution can lead to changed timing behavior, which in turn promotes timing-related flakiness. Ho...
»