Upstream branch naming changes breaking source mirrors


Khem Raj
 

On 7/7/20 6:33 PM, hongxu wrote:
On 7/7/20 7:42 PM, Richard Purdie wrote:
A number of upstream git repos we build from are transitioning "master"
branches to "main" branches. They're doing this and removing the old
name.
Yes, we've met the same issue twice,
iso-codes: switch upstream branch master -> main
libmodulemd: switch branch master -> main
Ideally, we could support both of master and main as default branch,
if any of them exist, it should work, but the affect of the fix
is overall, we need to consider git, gitsm, lfs.
//Hongxu

The scale of the problem this causes us is only just becoming apparent.
iso-codes did this, I tested a patch to update master-next. Everything
was fine until I did this as DL_DIR has "master" in it.

After I tested the change in master-next, the new main branch was added
to DL_DIR and the old master branch was removed. This broke master
which now no longer had the correct source in the mirror or from
upstream. It will have also broken dunfell and perhaps a number of
other releases. Those do have sources from the release period we could
use but they're not configured to fall back to them at present (they
probably should be?).

I suspect this is going to be a growing trend so we're going to have to
adapt our mirroring to cope better with this, perhaps by not removing
any branches/heads, only every updating/changing.

I wanted to give people a heads up that this is going to be an
increasing problem.
I think we should change to not assume default and ask branch= to be explicit perhaps will avoid future issues of such implicit changes.

Cheers,

Richard




hongxu
 

On 7/7/20 7:42 PM, Richard Purdie wrote:
A number of upstream git repos we build from are transitioning "master"
branches to "main" branches. They're doing this and removing the old
name.

Yes, we've met the same issue twice,

iso-codes: switch upstream branch master -> main

libmodulemd: switch branch master -> main

Ideally, we could support both of master and main as default branch,

if any of them exist, it should work, but the affect of the fix

is overall, we need to consider git, gitsm, lfs.

//Hongxu

The scale of the problem this causes us is only just becoming apparent.
iso-codes did this, I tested a patch to update master-next. Everything
was fine until I did this as DL_DIR has "master" in it.

After I tested the change in master-next, the new main branch was added
to DL_DIR and the old master branch was removed. This broke master
which now no longer had the correct source in the mirror or from
upstream. It will have also broken dunfell and perhaps a number of
other releases. Those do have sources from the release period we could
use but they're not configured to fall back to them at present (they
probably should be?).

I suspect this is going to be a growing trend so we're going to have to
adapt our mirroring to cope better with this, perhaps by not removing
any branches/heads, only every updating/changing.

I wanted to give people a heads up that this is going to be an
increasing problem.

Cheers,

Richard





    



Christopher Clark
 

On Tue, Jul 7, 2020 at 8:54 AM Mark Hatle
<mark.hatle@...> wrote:

On 7/7/20 9:16 AM, Richard Purdie wrote:
On Tue, 2020-07-07 at 08:58 -0500, Joshua Watt wrote:

On 7/7/20 6:42 AM, Richard Purdie wrote:
A number of upstream git repos we build from are transitioning
"master"
branches to "main" branches. They're doing this and removing the
old
name.

The scale of the problem this causes us is only just becoming
apparent.
iso-codes did this, I tested a patch to update master-next.
Everything
was fine until I did this as DL_DIR has "master" in it.

After I tested the change in master-next, the new main branch was
added
to DL_DIR and the old master branch was removed. This broke master
which now no longer had the correct source in the mirror or from
upstream. It will have also broken dunfell and perhaps a number of
other releases. Those do have sources from the release period we
could
use but they're not configured to fall back to them at present
(they
probably should be?).
I'm a little confused; is the old SHA1 not an ancestor of the new
branch head? I would have expected the required SHA1 to be in DL_DIR
just the same as if the master branch head had moved?
The fetcher is strict about which branch the SHA1 is on. There were
good reasons we started enforcing that, I have to admit I don't
remember the reasons offhand. Its that which is tripping things up
though.
One of the reasons we were enforcing this, there were people doing package
updates calling a package one version and pulling the source from a completely
different branch.

It was making it difficult to properly name the version, look for CVEs, etc.
In case an additional example of branch specifier tripping up the
build when the upstream repo changes is useful, here's one:
the linux-raspberrypi recipe in meta-raspberrypi builds from a Linux
kernel tree posted on github, where the branch that is maintained for
a given kernel major version (eg. "rpi-5.4.y") is regularly
force-pushed to by the maintainers as their standard practice, with a
massive delta from what was there previously. There appear to be no
regularly issued tags or releases viable to switch over to either.

The commit referenced in the linux-raspberrypi SRCREV is still present
in the repository - and I'm not sure whether that can be relied upon
indefinitely - but it's no longer on any branch (so that includes not
being on the branch that specified in the recipe), and so building the
recipe fails.

Christopher

Is this caused because the SRC_URI specifies a branch and the fetched
only downloads that specific branch from upstream? If a branch is
specified, should that be encoded in the DL archive somehow?
We only have one archive per repository, not per branch.
That has hurt in the past for the opposite reason. (Binutils) someone creates
branch ABC. Then they remove branch ABC, and create a new branch ABC/XYZ.

If you don't prune ABC, then you can't create the directory ABC to have XYZ in
it. (So you can't win in this case, you HAVE to prune or your HAVE to ignore
the new branch.)

(This likely won't be an issue for master/main -- but it's something to be aware
of.)

--Mark

Cheers,

Richard




Mark Hatle
 

On 7/7/20 9:16 AM, Richard Purdie wrote:
On Tue, 2020-07-07 at 08:58 -0500, Joshua Watt wrote:

On 7/7/20 6:42 AM, Richard Purdie wrote:
A number of upstream git repos we build from are transitioning
"master"
branches to "main" branches. They're doing this and removing the
old
name.

The scale of the problem this causes us is only just becoming
apparent.
iso-codes did this, I tested a patch to update master-next.
Everything
was fine until I did this as DL_DIR has "master" in it.

After I tested the change in master-next, the new main branch was
added
to DL_DIR and the old master branch was removed. This broke master
which now no longer had the correct source in the mirror or from
upstream. It will have also broken dunfell and perhaps a number of
other releases. Those do have sources from the release period we
could
use but they're not configured to fall back to them at present
(they
probably should be?).
I'm a little confused; is the old SHA1 not an ancestor of the new
branch head? I would have expected the required SHA1 to be in DL_DIR
just the same as if the master branch head had moved?
The fetcher is strict about which branch the SHA1 is on. There were
good reasons we started enforcing that, I have to admit I don't
remember the reasons offhand. Its that which is tripping things up
though.
One of the reasons we were enforcing this, there were people doing package
updates calling a package one version and pulling the source from a completely
different branch.

It was making it difficult to properly name the version, look for CVEs, etc.

Is this caused because the SRC_URI specifies a branch and the fetched
only downloads that specific branch from upstream? If a branch is
specified, should that be encoded in the DL archive somehow?
We only have one archive per repository, not per branch.
That has hurt in the past for the opposite reason. (Binutils) someone creates
branch ABC. Then they remove branch ABC, and create a new branch ABC/XYZ.

If you don't prune ABC, then you can't create the directory ABC to have XYZ in
it. (So you can't win in this case, you HAVE to prune or your HAVE to ignore
the new branch.)

(This likely won't be an issue for master/main -- but it's something to be aware
of.)

--Mark

Cheers,

Richard




Martin Jansa
 

On Tue, Jul 07, 2020 at 03:16:06PM +0100, Richard Purdie wrote:
The fetcher is strict about which branch the SHA1 is on. There were
good reasons we started enforcing that, I have to admit I don't
remember the reasons offhand. Its that which is tripping things up
though.
One of the reasons which I still find useful is that switching from
SRCREV in the recipe to AUTOREV does give you latest commit from the
same branch where locked SRCREV was. I don't know how important this is
for other people work flows.


Richard Purdie
 

On Tue, 2020-07-07 at 08:58 -0500, Joshua Watt wrote:

On 7/7/20 6:42 AM, Richard Purdie wrote:
A number of upstream git repos we build from are transitioning
"master"
branches to "main" branches. They're doing this and removing the
old
name.

The scale of the problem this causes us is only just becoming
apparent.
iso-codes did this, I tested a patch to update master-next.
Everything
was fine until I did this as DL_DIR has "master" in it.

After I tested the change in master-next, the new main branch was
added
to DL_DIR and the old master branch was removed. This broke master
which now no longer had the correct source in the mirror or from
upstream. It will have also broken dunfell and perhaps a number of
other releases. Those do have sources from the release period we
could
use but they're not configured to fall back to them at present
(they
probably should be?).
I'm a little confused; is the old SHA1 not an ancestor of the new
branch head? I would have expected the required SHA1 to be in DL_DIR
just the same as if the master branch head had moved?
The fetcher is strict about which branch the SHA1 is on. There were
good reasons we started enforcing that, I have to admit I don't
remember the reasons offhand. Its that which is tripping things up
though.

Is this caused because the SRC_URI specifies a branch and the fetched
only downloads that specific branch from upstream? If a branch is
specified, should that be encoded in the DL archive somehow?
We only have one archive per repository, not per branch.

Cheers,

Richard


Joshua Watt
 


On 7/7/20 6:42 AM, Richard Purdie wrote:
A number of upstream git repos we build from are transitioning "master"
branches to "main" branches. They're doing this and removing the old
name.

The scale of the problem this causes us is only just becoming apparent.
iso-codes did this, I tested a patch to update master-next. Everything
was fine until I did this as DL_DIR has "master" in it.

After I tested the change in master-next, the new main branch was added
to DL_DIR and the old master branch was removed. This broke master
which now no longer had the correct source in the mirror or from
upstream. It will have also broken dunfell and perhaps a number of
other releases. Those do have sources from the release period we could
use but they're not configured to fall back to them at present (they
probably should be?).

I'm a little confused; is the old SHA1 not an ancestor of the new branch head? I would have expected the required SHA1 to be in DL_DIR just the same as if the master branch head had moved?


Is this caused because the SRC_URI specifies a branch and the fetched only downloads that specific branch from upstream? If a branch is specified, should that be encoded in the DL archive somehow?



I suspect this is going to be a growing trend so we're going to have to
adapt our mirroring to cope better with this, perhaps by not removing
any branches/heads, only every updating/changing.

I wanted to give people a heads up that this is going to be an
increasing problem.

Cheers,

Richard





    


Adrian Bunk
 

On Tue, Jul 07, 2020 at 12:42:14PM +0100, Richard Purdie wrote:
...
I suspect this is going to be a growing trend so we're going to have to
adapt our mirroring to cope better with this, perhaps by not removing
any branches/heads, only every updating/changing.
...
What about making the git fetcher default to nobranch instead
of branch=master?

Using a branch name made sense back in the days when projects were not
deleting branches used downstream.

Having a default branch set in the git fetcher made sense back in the
days when there was agreement on the name of a development branch.

A branch in SRC_URI is just an optional automatic check that the commit
used is on that branch, it is not necessary.

Cheers,

Richard
cu
Adrian

BTW: In most cases a better check would be implementing #13303.


Richard Purdie
 

A number of upstream git repos we build from are transitioning "master"
branches to "main" branches. They're doing this and removing the old
name.

The scale of the problem this causes us is only just becoming apparent.
iso-codes did this, I tested a patch to update master-next. Everything
was fine until I did this as DL_DIR has "master" in it.

After I tested the change in master-next, the new main branch was added
to DL_DIR and the old master branch was removed. This broke master
which now no longer had the correct source in the mirror or from
upstream. It will have also broken dunfell and perhaps a number of
other releases. Those do have sources from the release period we could
use but they're not configured to fall back to them at present (they
probably should be?).

I suspect this is going to be a growing trend so we're going to have to
adapt our mirroring to cope better with this, perhaps by not removing
any branches/heads, only every updating/changing.

I wanted to give people a heads up that this is going to be an
increasing problem.

Cheers,

Richard