WORKDIR fetcher interaction issue


Richard Purdie
 

I was asked about a WORKDIR/fetcher interaction problem and the bugs it
results in. I've tried to write down my thoughts.

The unpack task writes it's output to WORKDIR as base.bbclass says:

fetcher = bb.fetch2.Fetch(src_uri, d)
fetcher.unpack(d.getVar('WORKDIR')

We historically dealt with tarballs which usually have a NAME-VERSION
directory within them, so when you extract them, they go into a sub
directory which tar creates. We usually call that subdirectory "S".

When we wrote the git fetcher, we emulated this by using a "git"
directory to extract into rather than WORKDIR.

For local files, there is no sub directory so they go into WORKDIR.
This includes patches, which do_patch looks for in WORKDIR and applies
them from there.

What issues does this cause? If you have an existing WORKDIR and run a
build with:

SRC_URI = "file://a file://b"

then change it to:

SRC_URI = "file://a"

and rebuild the recipe, the fetch and unpack tasks will rerun and their
hashes will change but the file "b" is still in WORKDIR. Nothing in the
codebase knows that it should delete "b" from there. If you have code
which does "if exists(b)", which is common, it will break.

There are variations on this, such as a conditional append on some
override to SRC_URI but the fundamental problem is one of cleanup when
unpack is to rerun.

The naive approach is then to think "lets just delete WORKDIR" when
running do_unpack. There is the small problem of WORKDIR/temp with logs
in. There is also the pseudo database and other things tasks could have
done. Basically, whilst tempting, it doesn't work out well in practise
particularly as that whilst unpack might rerun, not all other tasks
might.

I did also try a couple of other ideas. We could fetch into a
subdirectory, then either copy or symlink into place depending on which
set of performance/usabiity challenges you want to deal with. You could
involve a manifest of the files and then move into position so later
you'd know which ones to delete.

Part of the problem is that in some cases recipes do:

S = "${WORKDIR}"

for simplicity. This means that you also can't wipe out S as it might
point at WORKDIR.

SPDX users have requested a json file of file and checksums after the
unpack and before do_patch. Such a manifest could also be useful for
attempting cleanup of an existing WORKDIR so I suspect the solution
probably lies in that direction, probably unpacking into a subdir,
indexing it, then moving into position.

Personally, I'd also like to see S = "${WORKDIR}" deprecated and
dropped so that a subdir is always used, just to stop our code getting
too full of corner cases which are hard to maintain.

I've had a few experiments with variations on both issues on various
branches at different times, I just haven't had enough time to
socialise the changes, migrate code and handle the inevitable fallout.

Cheers,

Richard

Join {openembedded-architecture@lists.openembedded.org to automatically receive all group messages.