Re: Stable release testing - notes from the autobuilder perspective


Richard Purdie
 

On Mon, 2020-09-07 at 17:19 -0400, Tom Rini wrote:
On Mon, Sep 07, 2020 at 10:03:36PM +0100, Richard Purdie wrote:
On Mon, 2020-09-07 at 16:55 -0400, Tom Rini wrote:
The autobuilder is setup for speed so there aren't VMs involved, its
'baremetal'. Containers would be possible but at that point the kernel
isn't the distro kernel and you have permission issues with the qemu
networking for example.
Which issues do you run in to with qemu networking? I honestly don't
know if the U-Boot networking tests we run via qemu under Docker are
more or less complex than what you're running in to.
Its the tun/tap device requirement that tends to be the pain point.
Being able to ssh from the host OS into the qemu target image is a
central requirement of oeqa. Everyone tells me it should use
portmapping and slirp instead to avoid the privs problems and the
container issues which is great but not implemented.

Speed is extremely important as we have about a 6 hour build test time
but a *massive* test range (e.g. all the gcc/glibc test suites on each
arch, build+boot test all the arches under qemu for sysvinit+systemd,
oe-selftest on each distro). I am already tearing my hair out trying to
maintain what we have and deal with the races, adding in containers
into the mix simply isn't something I can face.

We do have older distros in the cluster for a time, e.g. centos7 is
still there although we've replaced the OS on some of the original
centos7 workers as the hardware had disk failures so there aren't as
many of them as there were. Centos7 gives us problems trying to build
master.
The reason I was thinking about containers is that it should remove some
of what you have to face.
Removes some, yes, but creates a whole set of other issues.

Paul may or may not want to chime in on how
workable it ended up being for a particular customer, but leveraging
CROPS to setup build environment of a supported host and then running it
on whatever the available build hardware is, was good. It sounds like
part of the autobuilder problem is that it has to be a specific set of
hand-crafted machines and that in turn feels like we've lost the
thread, so to speak,
The machines are in fact pretty much off the shelf distro installs so
not hand crafted.

about having a reproducible build system. 6 hours
even beats my U-Boot world before/after times, so I do get the dread of
"now it might take 5% longer, which is a very real more wallclock time.
But if it means more builders could be available as they're easy to spin
up, that could bring the overall time down.
Here we get onto infrastructure as we're not talking containers on our
workers but on general cloud systems which is a different proposition.

We *heavily* rely on the fast network fabric between the workers and
our nas for sstate (NFS mounted). This is where we get a big chunk of
speed. So "easy to spin up" isn't actually the case for different
reasons.

So this plan is the best practical approach we can come up with to
allow us to be able to build older releases yet not change the
autobuilders too much and cause new sets of problems. I should have
mentioned this, I just assume people kind of know this, sorry.
Since I don't want to put even more on your plate, what kind of is the
reasonable test to try here? Or is it hard to say since it's not just
"MACHINE=qemux86-64 bitbake world" but also "run this and that and
something else" ?
Its quite simple:

MACHINE=qemux86-64 bitbake core-image-sato-sdk -c testimage

and

MACHINE=qemux86-64 bitbake core-image-sato-sdk -c testsdkext

are the two to start with. If those work, the other "nasty" ones are
oe-selftest and the toolchain test suites. Also need to check kvm is
working.

We have gone around in circles on this several times as you're not the
first to suggest it :/.

Cheers,

Richard

Join openembedded-architecture@lists.openembedded.org to automatically receive all group messages.