Release builds and intermittent failures - I'm close to breaking point
Richard Purdie
For many months a small group of people have tried to get our automated testing
to the point where we can make builds with reliable test results. We hold the project to a high standard and the TSC (rightly?) doesn't allow a builds with failures to be released. Unfortunately, I simply can't get a build without failures. rc1 went into QA with permission from the TSC to ignore one issue as it was a well known intermittent failure as well as a warning. rc1 had other issues so we moved to an rc2. I tried a rc2 build last night in the hope that I could get it done and spend the long weekend here in the UK doing something else. This one failed with: * systemd unit failed to start (avahi in qemuarm) https://autobuilder.yoctoproject.org/typhoon/#/builders/110/builds/3953 * reproducibility failure in alsa-tools https://autobuilder.yoctoproject.org/typhoon/#/builders/117/builds/691 * warning for a valgrind ptest issue * meta-aws warning for SRCREV_FORMAT for multiple git url usage What really gets to me is that these are all known, we talked about the first two in triage yesterday, I think I was even persuaded to agree to close the systemd one as it hadn't happened for a while but I might be mis-remembering. There are challenges in debugging and fixing intermittent issues but we simply don't have people with the time to spend on them. The net result is we can't move forward and make releases and I'm going crazy in the process. Either we dive in and fix these and find a sustainable way for people to share the load on issues even if they don't directly care about specific ones, or we just drop the project quality bar. I now face the choice of spending the long weekend here trying to do something about these issues (and others raised on the list) or to try and ignore this until Tuesday knowing we'll miss QA next week by the time they're addressed, hence putting the release timing at risk. I'm writing this down to make the situation clear as I think many people don't even realise. Please share as appropriate to people making resourcing decisions. The ask is simple, we need more people able to spend more time on general bug fixing as the project quality will and is already suffering. Richard |
|