Automatic regression handling and reporting for the Linux Kernel

Automatic regression handling and reporting for the Linux Kernel

Ricardo Cañuelo Navarro
March 14, 2024

Share this post:

Reading time:

In continuation with our series about Kernel Integration (check out part 1, part 2, and part 3), this post will go into more detail about how regression detection, processing, and tracking can be improved to provide a better service to developers and maintainers.

Traditionally, regressions are detected automatically by CI systems by running the same test cases on different versions of the software to test (in this case, the Linux kernel) and checking if a test that used to pass starts failing after a specific kernel commit. In the most ideal and straightforward case, this should be enough to point to the commit that introduced the bug. The CI system can then generate a regression report and send it to a mailing list or to the appropriate maintainers and developers, if they can be deduced from the suspicious commit.

In practice, though, very rarely do we find this ideal scenario. There are several circumstances that make this process much harder in different ways:

Normally there's no guarantee that there'll be a test run for each repo commit, so most of the time there isn't a single suspicious commit for a reported regression.
Tests that involve booting and running a machine are significantly more complicated than tests that simply run a software process in isolation. The more moving parts in a setup, the more things can go wrong.
This means that not all test failures are caused by bugs introduced in the kernel.
Test code isn't infallible, they too can contain bugs that may surface for multiple reasons.

As a consequence of these, it's always necessary to have a certain amount of human intervention when reporting a regression to the community. Normally, this human intervention means doing some initial filtering of results, triaging them depending on their importance and feasibility, narrowing down the possible causes, and providing additional information that's not always evident from the initial data provided by the CI system.

There are obvious downsides to this process, the most important of all being that it's not scalable: as the test space grows, more people will be needed to keep up. Automating this process as much as possible is crucial to grow the kernel test ecosystem from a useful tool to an integral and prevalent part of the development workflow.

How can the appropriate tools help us with this task? Here are some ideas:

Post-processing of regression data

The information provided by a CI system about a regression is, most of the time, a snapshot of what happened with that test when it failed. However, further processing of that result and other neighbor data across time can reveal more information that's usually hidden to the naked eye. For instance:

Detection of unstable tests: when a test is found to fail intermittently over different kernel versions, there's a higher probability that the test is unstable due to a bug in the test code, a timing issue, race conditions, or other external circumstances rather than because a commit introduced a bug in every pass-to-fail transition. Implementing smart filters and heuristics may help detect this type of scenario.
Detection of configuration-specific, target-specific or test setup issues: collecting information about similar tests, or about the same test on different kernel configurations, or on different target platforms may highlight if a test failed on a specific scenario that could help a human inspector either filter out possible causes or narrow down the bug investigation.
Detection of known patterns in the test output: there's a myriad of possible post-processing options to apply to a test output log to categorize and detect specific issues. These range from the simplest text parsing to find known messages, automatically diagnosing a failure (for example, a failure to boot because of a problem mounting the rootfs, a timeout while waiting for a DHCP request, etc.), to advanced ML-based analysis to profile a bug from a console log so that it can be matched against other known instances of the same (or similar) bug in other regressions.

Tracking the regression's life cycle

Even if the data provided by the CI systems included all of these improvements, there's still the issue of following up on the status of a reported regression.

Regressions are not static entities, they have a well defined life cycle: they're detected, reported, and investigated, then they're either filed as a non-issue (false positive, intended behavior, etc) or are fixed. The fixing process involves submitting a patch, reviewing it, testing it and, ultimately, merging it and then checking that the regression has cleared up after the patch was merged.

All of this happens with almost no visibility of who's working on what and at which stage of the process a regression is in. Thorsten Leemhuis created regzbot to help with this, it keeps track of the status of reported regressions by checking mailing lists and repos automatically. A way to vastly improve this would be to integrate these features into the CI systems themselves so that anyone could get the current status of any discovered regression and update it as needed, solving common user questions like:

"Has anyone claimed and started to work on this regression?"
"Does this regression have an associated patch submitted already?"
"When was this fixed? where can I find a link to the patch review?"

Better integration of bisection processes with regressions

Bisections are already an important part of many CI systems and they provide an automatic way of pointing to the commit that caused a regression, assuming that the repo history is linear and that the test is stable.

In some cases, however, bisections are triggered and managed as a separate process from testing. Making sure they are fully integrated into the test generation and report infrastructure would make it easier to match a regression with its related bisection process and vice-versa. This allows anyone to check right away after getting a regression report if the regression was bisected already and if there's a good candidate commit to investigate. In the best case, if the test is stable and the bisection process is trustworthy, the results can be automatically reported to the commit author.

Conclusion

As we continue working on kernel regressions we're still finding ideas for improvements and new features. A big part of the effort is to bring these topics to the community, find a way of providing these features in a manner that's useful for all of us, and align the different projects in the ecosystem together toward the same goal. Hopefully we'll get to a point where regression checking as a process is seamlessly integrated into every kernel developer workflow.

DRM-CI: A GitLab-CI pipeline for Linux kernel testing

A new kselftest for verifying driver probe of Devicetree-based platforms

Advocating a better Kernel Integration for all

DRM-CI: A GitLab-CI pipeline for Linux kernel testing

A new kselftest for verifying driver probe of Devicetree-based platforms

Advocating a better Kernel Integration for all

Comments (0)

Add a Comment

Search the newsroom

Latest Blog Posts

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

03/07/2025

As part of the activities Embedded Recipes in Nice, France, Collabora hosted a PipeWire workshop/hackfest, an opportunity for attendees…

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

Linux Media Summit 2025 recap

23/06/2025

Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…

Constructor acquires, destructor releases

09/06/2025

In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…

What if C++ had decades to learn?

21/05/2025

In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

12/05/2025

Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기