Release builds and intermittent failures - I'm close to breaking point

Richard Purdie

For many months a small group of people have tried to get our automated testing
to the point where we can make builds with reliable test results. We hold the
project to a high standard and the TSC (rightly?) doesn't allow a builds with
failures to be released.

Unfortunately, I simply can't get a build without failures. rc1 went into QA
with permission from the TSC to ignore one issue as it was a well known
intermittent failure as well as a warning. rc1 had other issues so we moved to
an rc2.

I tried a rc2 build last night in the hope that I could get it done and spend
the long weekend here in the UK doing something else.

This one failed with:

* systemd unit failed to start (avahi in qemuarm)
* reproducibility failure in alsa-tools
* warning for a valgrind ptest issue
* meta-aws warning for SRCREV_FORMAT for multiple git url usage

What really gets to me is that these are all known, we talked about the first
two in triage yesterday, I think I was even persuaded to agree to close the
systemd one as it hadn't happened for a while but I might be mis-remembering.

There are challenges in debugging and fixing intermittent issues but we simply
don't have people with the time to spend on them. The net result is we can't
move forward and make releases and I'm going crazy in the process.

Either we dive in and fix these and find a sustainable way for people to share
the load on issues even if they don't directly care about specific ones, or we
just drop the project quality bar.

I now face the choice of spending the long weekend here trying to do something
about these issues (and others raised on the list) or to try and ignore this
until Tuesday knowing we'll miss QA next week by the time they're addressed,
hence putting the release timing at risk.

I'm writing this down to make the situation clear as I think many people don't
even realise. Please share as appropriate to people making resourcing decisions.

The ask is simple, we need more people able to spend more time on general bug
fixing as the project quality will and is already suffering.


Join to automatically receive all group messages.