Concerns about multilib bugs


Richard Purdie
 

I'm a bit worried about some of the mulitlib issues we've seen
recently. In particular we have a bug where license exclusion doesn't
work properly in multilib builds, which is fairly serious.

To dive into the details of what I found, as things stand today, the
code sequence is:

a) bb.event.RecipePreFinalise
b) bb.data.expandKeys()
c) base.bbclass anonymous python
d) multilib.bbclass anonymous python

The mutlilib code does half the work in a) but delays half to d) since
it has to happen after python key expansion, there is no way to avoid
that.

The issue is that there is code in base.bbclass's anonymous python
(step c) that assumes the package information in the datastore is
correct, yet it isn't under after d).

For example if you set:

PACKAGES += "X-bar"
LICENSE_${PN}-bar = "Y"
(and PN is X)

then base.bbclass can't see the LICENSE value by iterating PACKAGES.
Yes, you could argue PACKAGES should match the keys used but that is
hard to enforce and really shouldn't be necessary.

Having debugged this I decided an "easy" fix would be to add a new
event, bb.event.RecipePostKeyExpansion which happens after key
expansion and before anonymous python. This updates the flow to:

a) bb.event.RecipePreFinalise
b) bb.data.expandKeys()
c) multilib bb.event.RecipePostKeyExpansion
d) base.bbclass anonymous python

However things break :(.

There is anonymous python that assumes it runs before multilib, e.g.
mesa so we have to smatter in MLPREFIX references to make the code work
again as after this change, all anonymous python runs after multilib so
must use MLPREFIX in changes. Ok, fine, this at least gives us a
consistent API.

Sadly, things still break. For example:

DEPENDS += "${@bb.utils.contains('GI_DATA_ENABLED', 'True', 'libxslt-native', '', d)}"

since this is expanded by the multilib RecipePostKeyExpansion event,
this happens before DISTRO_FEATURES is manipulated by base.bbclass and
the value is lost.

PACKAGES += "${@bb.utils.contains('PACKAGECONFIG', 'pulseaudio', 'alsa-plugins-pulseaudio-conf', '', d)}"

which also suffered from effectively the same issue.

There were also problems from:

RECOMMENDS_${PN} += "${BOOST_PACKAGES}"

Since BOOST_PACKAGES is unset until after anonymous python runs and the
new multilib code tried to create packages like:

"lib32-${BOOST_PACKAGES}" which translated to "lib32-lib32-boost-A
lib32-boost-B".

I'm struggling to decide how best we can fix much of this. One approach
is to avoid expanding PACKAGES and DEPENDS, instead we can wrap them,
saving the unexpanded value to some hidden variable and creating a
function which dynamically expands them on the fly. There are patches
to do this in master-next. Its rather ugly magic but does appear to
work, I'm testing on the autobuilder atm just to see if it does fix
issues.

Its possible we could create a new bitbake API for this with variables
having "filter" functions, called when the value is returned, I just
worry about performance overheads.

Does anyone have any other ideas or opinions on this?

Cheers,

Richard


Joshua Watt
 

On Sun, Apr 19, 2020 at 12:04 PM Richard Purdie
<richard.purdie@...> wrote:

I'm a bit worried about some of the mulitlib issues we've seen
recently. In particular we have a bug where license exclusion doesn't
work properly in multilib builds, which is fairly serious.

To dive into the details of what I found, as things stand today, the
code sequence is:

a) bb.event.RecipePreFinalise
b) bb.data.expandKeys()
c) base.bbclass anonymous python
d) multilib.bbclass anonymous python

The mutlilib code does half the work in a) but delays half to d) since
it has to happen after python key expansion, there is no way to avoid
that.

The issue is that there is code in base.bbclass's anonymous python
(step c) that assumes the package information in the datastore is
correct, yet it isn't under after d).

For example if you set:

PACKAGES += "X-bar"
LICENSE_${PN}-bar = "Y"
(and PN is X)

then base.bbclass can't see the LICENSE value by iterating PACKAGES.
Yes, you could argue PACKAGES should match the keys used but that is
hard to enforce and really shouldn't be necessary.

Having debugged this I decided an "easy" fix would be to add a new
event, bb.event.RecipePostKeyExpansion which happens after key
expansion and before anonymous python. This updates the flow to:

a) bb.event.RecipePreFinalise
b) bb.data.expandKeys()
c) multilib bb.event.RecipePostKeyExpansion
d) base.bbclass anonymous python

However things break :(.

There is anonymous python that assumes it runs before multilib, e.g.
mesa so we have to smatter in MLPREFIX references to make the code work
again as after this change, all anonymous python runs after multilib so
must use MLPREFIX in changes. Ok, fine, this at least gives us a
consistent API.

Sadly, things still break. For example:

DEPENDS += "${@bb.utils.contains('GI_DATA_ENABLED', 'True', 'libxslt-native', '', d)}"

since this is expanded by the multilib RecipePostKeyExpansion event,
this happens before DISTRO_FEATURES is manipulated by base.bbclass and
the value is lost.

PACKAGES += "${@bb.utils.contains('PACKAGECONFIG', 'pulseaudio', 'alsa-plugins-pulseaudio-conf', '', d)}"

which also suffered from effectively the same issue.

There were also problems from:

RECOMMENDS_${PN} += "${BOOST_PACKAGES}"

Since BOOST_PACKAGES is unset until after anonymous python runs and the
new multilib code tried to create packages like:

"lib32-${BOOST_PACKAGES}" which translated to "lib32-lib32-boost-A
lib32-boost-B".

I'm struggling to decide how best we can fix much of this. One approach
is to avoid expanding PACKAGES and DEPENDS, instead we can wrap them,
saving the unexpanded value to some hidden variable and creating a
function which dynamically expands them on the fly. There are patches
to do this in master-next. Its rather ugly magic but does appear to
work, I'm testing on the autobuilder atm just to see if it does fix
issues.

Its possible we could create a new bitbake API for this with variables
having "filter" functions, called when the value is returned, I just
worry about performance overheads.
Continuing the discussion from the OE TSC meeting, now that I've had a
little time to digest the idea: I don't think the filter functions are
necessarily a bad idea. I think the real problem is how to prevent
them from becoming API maintenance and debugging problems, and as you
stated, performance issues. I think there might be a few ways to
address this, and most of it comes down to strictly defining the API.
Firstly, I don't think we want to allow (at least at this point)
completely arbitrary python functions to be called as a filter.
Hopefully, we can figure out the few filter manipulations that are
required and design the API such that those can be used without having
to open it all the way up. Secondly, I think we should assume that any
filter function is "pure" [1], meaning it only takes in the input
variable value string, possibly some other manipulation arguments, and
outputs the new variable value string. Notable, it should *not* take
in the datastore object, since this is inherently global data that can
change at any time. Hopefully, this will allow optimizations and
prevent performance issues because bitbake can keep a cache of filter
function inputs mapped to outputs. Since the function outputs *only*
depend on the inputs, the filter can avoid being called multiple times
if the input set has been seen before.

I can dream up a few ways this might look. In this example, I'm
defining a filter function that simply adds a defined prefix to every
value in a space separated variable.

First, the filter function itself:

def filter_prefix(in, prefix):
return ' '.join(prefix + v for v in in.split())

As for how to express this in a recipe, I can think of a few ways.
This first one is a variable flag. The base flag is "filter" and the
suffix indicates which filter is applied (in this case, the "prefix"
function). The value are the extra arguments passed to the function.

DEPENDS[filter:prefix] = "${MLPREFIX}"

Another option would be to encode the function and arguments in the
flag value, which might also allow filters to be chained (although, I
would hope that's a rare requirement!):

DEPENDS[filter] = "prefix ${MLPREFIX}"

Chaining might look something like:

DEPENDS[filter] = "prefix ${MLPREFIX}; suffix ${MY_SUFFIX}"

The great part about pure functions here is that the property is
transitive, so the entire result of chain with both operations (e.g.
suffix(prefix(${DEPENDS}))) can be cached if desired.

There's probably a few other ways to express this, but the idea is
that it's more tightly controlled about what can be applied.


[1]: https://en.wikipedia.org/wiki/Pure_function


Does anyone have any other ideas or opinions on this?

Cheers,

Richard



 

On Sun, 19 Apr 2020 18:04:34 +0100
"Richard Purdie" <richard.purdie@...> wrote:

I'm a bit worried about some of the mulitlib issues we've seen
recently. In particular we have a bug where license exclusion doesn't
work properly in multilib builds, which is fairly serious.

To dive into the details of what I found, as things stand today, the
code sequence is:

a) bb.event.RecipePreFinalise
b) bb.data.expandKeys()
c) base.bbclass anonymous python
d) multilib.bbclass anonymous python

The mutlilib code does half the work in a) but delays half to d) since
it has to happen after python key expansion, there is no way to avoid
that.

The issue is that there is code in base.bbclass's anonymous python
(step c) that assumes the package information in the datastore is
correct, yet it isn't under after d).

For example if you set:

PACKAGES += "X-bar"
LICENSE_${PN}-bar = "Y"
(and PN is X)

then base.bbclass can't see the LICENSE value by iterating PACKAGES.
Yes, you could argue PACKAGES should match the keys used but that is
hard to enforce and really shouldn't be necessary.
At first glance this appears to be a mismatch between PACKAGES and the
override used on LICENSE. It should either be:

PACKAGES += "X-bar"
LICENSE_X-bar = "Y"

or

PACKAGES += "${PN}-bar"
LICENSE_${PN}-bar = "Y"

but mixing those is always going to lead to pain somewhere in my
experience.

Of the two I think using `${PN}` is the correct one otherwise you'd get
a clash between the package names when building with multilib (so you
get `X-bar` and `lib32-X-bar` for example instead of two packages both
named `X-bar`).

We've already got warnings for `${PN}` used in places where `${BPN}`
should really be used. Perhaps this is a case for something similar
like raising a warning if PACKAGES contains the value of PN as a
literal instead of using `${PN}`?

Thanks,

--
Paul Barker
Konsulko Group


Richard Purdie
 

On Wed, 2020-04-22 at 08:21 +0100, Paul Barker wrote:
On Sun, 19 Apr 2020 18:04:34 +0100
"Richard Purdie" <richard.purdie@...> wrote:

I'm a bit worried about some of the mulitlib issues we've seen
recently. In particular we have a bug where license exclusion
doesn't
work properly in multilib builds, which is fairly serious.

To dive into the details of what I found, as things stand today,
the
code sequence is:

a) bb.event.RecipePreFinalise
b) bb.data.expandKeys()
c) base.bbclass anonymous python
d) multilib.bbclass anonymous python

The mutlilib code does half the work in a) but delays half to d)
since
it has to happen after python key expansion, there is no way to
avoid
that.

The issue is that there is code in base.bbclass's anonymous python
(step c) that assumes the package information in the datastore is
correct, yet it isn't under after d).

For example if you set:

PACKAGES += "X-bar"
LICENSE_${PN}-bar = "Y"
(and PN is X)

then base.bbclass can't see the LICENSE value by iterating
PACKAGES.
Yes, you could argue PACKAGES should match the keys used but that
is
hard to enforce and really shouldn't be necessary.
At first glance this appears to be a mismatch between PACKAGES and
the
override used on LICENSE. It should either be:

PACKAGES += "X-bar"
LICENSE_X-bar = "Y"

or

PACKAGES += "${PN}-bar"
LICENSE_${PN}-bar = "Y"

but mixing those is always going to lead to pain somewhere in my
experience.
Right, mixing them definitely gives the worst possible behaviours.
Sadly we have recipes doing this today, with unpredictable side
effects. We have always said we don't support this but it does mostly
work and is hard to spot and even more painful to debug when it breaks.

Of the two I think using `${PN}` is the correct one otherwise you'd
get a clash between the package names when building with multilib (so
you get `X-bar` and `lib32-X-bar` for example instead of two packages
both named `X-bar`).
One solution is to use ${PN} or ${MLPREFIX} everywhere. The multilib
class exists mainly as we didn't want to go through every recipe and
mark things up that way, believing instead the code could automate it.

We've already got warnings for `${PN}` used in places where `${BPN}`
should really be used. Perhaps this is a case for something similar
like raising a warning if PACKAGES contains the value of PN as a
literal instead of using `${PN}`?
This isn't possible as we have a lot of recipes generating packages
with names that don't contain PN...

Cheers,

Richard


Richard Purdie
 

On Tue, 2020-04-21 at 21:51 -0500, Joshua Watt wrote:
On Sun, Apr 19, 2020 at 12:04 PM Richard Purdie
<richard.purdie@...> wrote:
Continuing the discussion from the OE TSC meeting, now that I've had
a little time to digest the idea: I don't think the filter functions
are necessarily a bad idea. I think the real problem is how to
prevent them from becoming API maintenance and debugging problems,
and as you stated, performance issues. I think there might be a few
ways to address this, and most of it comes down to strictly defining
the API.
Firstly, I don't think we want to allow (at least at this point)
completely arbitrary python functions to be called as a filter.
Hopefully, we can figure out the few filter manipulations that are
required and design the API such that those can be used without
having to open it all the way up.
I think that is a sensible idea.

The good news is we know roughly what the current needs look like, the
code is in lib/oe/classextend.py:

def extend_name(self, name):
if name.startswith("kernel-") or name == "virtual/kernel":
return name
if name.startswith("rtld"):
return name
if name.endswith("-crosssdk"):
return name
if name.endswith("-" + self.extname):
name = name.replace("-" + self.extname, "")
if name.startswith("virtual/"):
subs = name.split("/", 1)[1]
if not subs.startswith(self.extname):
return "virtual/" + self.extname + "-" + subs
return name
if name.startswith("/") or (name.startswith("${") and name.endswith("}")):
return name
if not name.startswith(self.extname):
return self.extname + "-" + name
return name

def map_depends(self, dep):
if dep.endswith(("-native", "-native-runtime")) or ('nativesdk-' in dep) or ('cross-canadian' in dep) or ('-crosssdk-' in dep):
return dep
else:
# Do not extend for that already have multilib prefix
var = self.d.getVar("MULTILIB_VARIANTS")
if var:
var = var.split()
for v in var:
if dep.startswith(v):
return dep
return self.extend_name(dep)

which apart from one getVar call, is all static string manipulation.
Could be better, could be worse. Its surprisingly hard to manipulate a
depends string (we also have to use the explode_deps function as the
strings can have versions in).

Secondly, I think we should assume that any
filter function is "pure" [1], meaning it only takes in the input
variable value string, possibly some other manipulation arguments,
and outputs the new variable value string. Notable, it should *not*
take in the datastore object, since this is inherently global data
that can change at any time. Hopefully, this will allow optimizations
and prevent performance issues because bitbake can keep a cache of
filter function inputs mapped to outputs. Since the function outputs
*only* depend on the inputs, the filter can avoid being called
multiple times if the input set has been seen before.
That definitely looks like a good move and alleviates some of the
concerns I have about it.

I do wonder how we handle MULTILIB_VARIANTS in the above example.
Perhaps some kind of cut down key/value pairs or allow parameters?

I can dream up a few ways this might look. In this example, I'm
defining a filter function that simply adds a defined prefix to every
value in a space separated variable.

First, the filter function itself:

def filter_prefix(in, prefix):
return ' '.join(prefix + v for v in in.split())

As for how to express this in a recipe, I can think of a few ways.
This first one is a variable flag. The base flag is "filter" and the
suffix indicates which filter is applied (in this case, the "prefix"
function). The value are the extra arguments passed to the function.

DEPENDS[filter:prefix] = "${MLPREFIX}"

Another option would be to encode the function and arguments in the
flag value, which might also allow filters to be chained (although, I
would hope that's a rare requirement!):

DEPENDS[filter] = "prefix ${MLPREFIX}"

Chaining might look something like:

DEPENDS[filter] = "prefix ${MLPREFIX}; suffix ${MY_SUFFIX}"

The great part about pure functions here is that the property is
transitive, so the entire result of chain with both operations (e.g.
suffix(prefix(${DEPENDS}))) can be cached if desired.
Those are good suggestions. Thinking out loud, I'm kind of drawn to
something which looks a bit more function like:

DEPENDS[filter] = "prefix_filter('${MLPREFIX}') suffix_filter('${MY_SUFFIX}')"

as it might be more obvious and searchable through grep? I don't like
";" as a delimiter as it doesn't work well with append/prepend of
values. Its a tradeoff between the need for some parameters to be
passed in and complexity.

I do like the fact we can cache this approach, that will help a lot
with performance.

There's probably a few other ways to express this, but the idea is
that it's more tightly controlled about what can be applied.

[1]: https://en.wikipedia.org/wiki/Pure_function
Its definitely the way to go...

Cheers,

Richard


Chris Laplante
 

DEPENDS[filter] = "prefix_filter('${MLPREFIX}') suffix_filter('${MY_SUFFIX}')"

as it might be more obvious and searchable through grep? I don't like ";" as a
delimiter as it doesn't work well with append/prepend of values. Its a
tradeoff between the need for some parameters to be passed in and
complexity.

I do like the fact we can cache this approach, that will help a lot with
performance.

There's probably a few other ways to express this, but the idea is
that it's more tightly controlled about what can be applied.
Out of curiosity, do you anticipate that this will be used for any part of the implementation of multiconfig as well?

Chris


Richard Purdie
 

On Wed, 2020-04-22 at 13:45 +0000, chris.laplante@... wrote:
DEPENDS[filter] = "prefix_filter('${MLPREFIX}')
suffix_filter('${MY_SUFFIX}')"

as it might be more obvious and searchable through grep? I don't
like ";" as a
delimiter as it doesn't work well with append/prepend of values.
Its a
tradeoff between the need for some parameters to be passed in and
complexity.

I do like the fact we can cache this approach, that will help a lot
with
performance.

There's probably a few other ways to express this, but the idea
is
that it's more tightly controlled about what can be applied.
Out of curiosity, do you anticipate that this will be used for any
part of the implementation of multiconfig as well?
Not that I see at the moment. Only for native/nativesdk/multiconfig
(anything using the class extension code).

Cheers,

Richard


Joshua Watt
 


On 4/22/20 5:11 AM, Richard Purdie wrote:
On Tue, 2020-04-21 at 21:51 -0500, Joshua Watt wrote:
On Sun, Apr 19, 2020 at 12:04 PM Richard Purdie
<richard.purdie@...> wrote:
Continuing the discussion from the OE TSC meeting, now that I've had
a little time to digest the idea: I don't think the filter functions
are necessarily a bad idea. I think the real problem is how to
prevent them from becoming API maintenance and debugging problems,
and as you stated, performance issues. I think there might be a few
ways to address this, and most of it comes down to strictly defining
the API.
Firstly, I don't think we want to allow (at least at this point)
completely arbitrary python functions to be called as a filter.
Hopefully, we can figure out the few filter manipulations that are
required and design the API such that those can be used without
having to open it all the way up.
I think that is a sensible idea.

The good news is we know roughly what the current needs look like, the
code is in lib/oe/classextend.py:

    def extend_name(self, name):
        if name.startswith("kernel-") or name == "virtual/kernel":
            return name
        if name.startswith("rtld"):
            return name
        if name.endswith("-crosssdk"):
            return name
        if name.endswith("-" + self.extname):
            name = name.replace("-" + self.extname, "")
        if name.startswith("virtual/"):
            subs = name.split("/", 1)[1]
            if not subs.startswith(self.extname):
                return "virtual/" + self.extname + "-" + subs
            return name
        if name.startswith("/") or (name.startswith("${") and name.endswith("}")):
            return name
        if not name.startswith(self.extname):
            return self.extname + "-" + name
        return name

    def map_depends(self, dep):
        if dep.endswith(("-native", "-native-runtime")) or ('nativesdk-' in dep) or ('cross-canadian' in dep) or ('-crosssdk-' in dep):
            return dep
        else:
            # Do not extend for that already have multilib prefix
            var = self.d.getVar("MULTILIB_VARIANTS")
            if var:
                var = var.split()
                for v in var:
                    if dep.startswith(v):
                        return dep
            return self.extend_name(dep)

which apart from one getVar call, is all static string manipulation.
Could be better, could be worse. Its surprisingly hard to manipulate a
depends string (we also have to use the explode_deps function as the
strings can have versions in).

 Secondly, I think we should assume that any
filter function is "pure" [1], meaning it only takes in the input
variable value string, possibly some other manipulation arguments,
and outputs the new variable value string. Notable, it should *not*
take in the datastore object, since this is inherently global data
that can change at any time. Hopefully, this will allow optimizations
and prevent performance issues because bitbake can keep a cache of
filter function inputs mapped to outputs. Since the function outputs
*only* depend on the inputs, the filter can avoid being called
multiple times if the input set has been seen before.
That definitely looks like a good move and alleviates some of the
concerns I have about it.

I do wonder how we handle MULTILIB_VARIANTS in the above example.
Perhaps some kind of cut down key/value pairs or allow parameters?

I can dream up a few ways this might look. In this example, I'm
defining a filter function that simply adds a defined prefix to every
value in a space separated variable.

First, the filter function itself:

 def filter_prefix(in, prefix):
     return ' '.join(prefix + v for v in in.split())

As for how to express this in a recipe, I can think of a few ways.
This first one is a variable flag. The base flag is "filter" and the
suffix indicates which filter is applied (in this case, the "prefix"
function). The value are the extra arguments passed to the function.

 DEPENDS[filter:prefix] = "${MLPREFIX}"

Another option would be to encode the function and arguments in the
flag value, which might also allow filters to be chained (although, I
would hope that's a rare requirement!):

 DEPENDS[filter] = "prefix ${MLPREFIX}"

Chaining might look something like:

 DEPENDS[filter] = "prefix ${MLPREFIX}; suffix ${MY_SUFFIX}"

The great part about pure functions here is that the property is
transitive, so the entire result of chain with both operations (e.g.
suffix(prefix(${DEPENDS}))) can be cached if desired.
Those are good suggestions. Thinking out loud, I'm kind of drawn to
something which looks a bit more function like:

DEPENDS[filter] = "prefix_filter('${MLPREFIX}') suffix_filter('${MY_SUFFIX}')"
I was trying to avoid anything too complex to parse.... but I wonder if we can enlist the help of the python interpreter. If we have an appropriate delimiter, something like this might work:

 def apply_filter(value, filter_expression):
     class Val:
         def __init__(self, value):
             self.value = value
         
         def prefix(self, prefix):
             self.value = ' '.join(prefix + v for v in self.__value.split())
         
         def suffix(self, suffix):
             self.value = ' '.join(v + suffix for v in self.__value.split())

     v = Val(value)
     for e in filter_expression.split(';'):
         e = e.strip()
         if not e:
            continue  
         eval(e, globals={}, locals={'v': v})
     return v.value

Obviously, you might want something a little more dynamic for registering the actual filter functions into the class, but the important part is that the eval() is pretty restrictive in what filter functions are allowed to access or do.

This would allow the user to write a fairly natural expression like:

 DEPENDS[filter] = "v.prefix('${MLPREFIX}'); v.suffix('${MY_SUFFIX}')

but, things like this also still work (since the code ignores empty expressions):

 DEPEND[filter] += "; v.prefix('foo')"

The hardest part is finding a delimiter that isn't likely to show up in the actual variable values, since the string has to be split and duplicates ignored before it is parsed by the python code.



as it might be more obvious and searchable through grep? I don't like
";" as a delimiter as it doesn't work well with append/prepend of
values. Its a tradeoff between the need for some parameters to be
passed in and complexity.

I do like the fact we can cache this approach, that will help a lot
with performance.

There's probably a few other ways to express this, but the idea is
that it's more tightly controlled about what can be applied.

[1]: https://en.wikipedia.org/wiki/Pure_function
Its definitely the way to go...

Cheers,

Richard




Quentin Schulz <quentin.schulz@...>
 

Hi all,

Giving my 2¢.

On Wed, Apr 22, 2020 at 09:10:42AM +0100, Richard Purdie wrote:
On Wed, 2020-04-22 at 08:21 +0100, Paul Barker wrote:
On Sun, 19 Apr 2020 18:04:34 +0100
"Richard Purdie" <richard.purdie@...> wrote:

Of the two I think using `${PN}` is the correct one otherwise you'd
get a clash between the package names when building with multilib (so
you get `X-bar` and `lib32-X-bar` for example instead of two packages
both named `X-bar`).
One solution is to use ${PN} or ${MLPREFIX} everywhere. The multilib
class exists mainly as we didn't want to go through every recipe and
mark things up that way, believing instead the code could automate it.
Indeed, and the thing is... it does not fix the original issue, so third
party layers not doing that will still have the same bug which is hard
to detect. Since we can't unforce it, that seems to not be the solution.

We've already got warnings for `${PN}` used in places where `${BPN}`
should really be used. Perhaps this is a case for something similar
like raising a warning if PACKAGES contains the value of PN as a
literal instead of using `${PN}`?
This isn't possible as we have a lot of recipes generating packages
with names that don't contain PN...
Also, this looks like a dangerous whack-a-mole. We know it breaks for
LICENSE, does it break for something else we haven't found yet, are we
going to forget on adding a warning once a new feature could have the
same bogus behavior?

It's like silencing the actual issue and giving us a false sense of safety.

Thanks for looking into the issue and resuming much better the issue
than in the book I sent as a bug report :)
Quentin


Quentin Schulz <quentin.schulz@...>
 

Hi all,

I still had a couple of ¢ to give so here they are.

On Wed, Apr 22, 2020 at 09:30:23AM -0500, Joshua Watt wrote:

On 4/22/20 5:11 AM, Richard Purdie wrote:
On Tue, 2020-04-21 at 21:51 -0500, Joshua Watt wrote:
On Sun, Apr 19, 2020 at 12:04 PM Richard Purdie
<richard.purdie@...> wrote:
Continuing the discussion from the OE TSC meeting, now that I've had
a little time to digest the idea: I don't think the filter functions
are necessarily a bad idea. I think the real problem is how to
prevent them from becoming API maintenance and debugging problems,
and as you stated, performance issues. I think there might be a few
ways to address this, and most of it comes down to strictly defining
the API.
Firstly, I don't think we want to allow (at least at this point)
completely arbitrary python functions to be called as a filter.
Hopefully, we can figure out the few filter manipulations that are
required and design the API such that those can be used without
having to open it all the way up.
I think that is a sensible idea.

The good news is we know roughly what the current needs look like, the
code is in lib/oe/classextend.py:

def extend_name(self, name):
if name.startswith("kernel-") or name == "virtual/kernel":
return name
if name.startswith("rtld"):
return name
if name.endswith("-crosssdk"):
return name
if name.endswith("-" + self.extname):
name = name.replace("-" + self.extname, "")
if name.startswith("virtual/"):
subs = name.split("/", 1)[1]
if not subs.startswith(self.extname):
return "virtual/" + self.extname + "-" + subs
return name
if name.startswith("/") or (name.startswith("${") and name.endswith("}")):
return name
if not name.startswith(self.extname):
return self.extname + "-" + name
return name

def map_depends(self, dep):
if dep.endswith(("-native", "-native-runtime")) or ('nativesdk-' in dep) or ('cross-canadian' in dep) or ('-crosssdk-' in dep):
return dep
else:
# Do not extend for that already have multilib prefix
var = self.d.getVar("MULTILIB_VARIANTS")
if var:
var = var.split()
for v in var:
if dep.startswith(v):
return dep
return self.extend_name(dep)

which apart from one getVar call, is all static string manipulation.
Could be better, could be worse. Its surprisingly hard to manipulate a
depends string (we also have to use the explode_deps function as the
strings can have versions in).

Secondly, I think we should assume that any
filter function is "pure" [1], meaning it only takes in the input
variable value string, possibly some other manipulation arguments,
and outputs the new variable value string. Notable, it should *not*
take in the datastore object, since this is inherently global data
that can change at any time. Hopefully, this will allow optimizations
and prevent performance issues because bitbake can keep a cache of
filter function inputs mapped to outputs. Since the function outputs
*only* depend on the inputs, the filter can avoid being called
multiple times if the input set has been seen before.
That definitely looks like a good move and alleviates some of the
concerns I have about it.

I do wonder how we handle MULTILIB_VARIANTS in the above example.
Perhaps some kind of cut down key/value pairs or allow parameters?

I can dream up a few ways this might look. In this example, I'm
defining a filter function that simply adds a defined prefix to every
value in a space separated variable.

First, the filter function itself:

def filter_prefix(in, prefix):
return ' '.join(prefix + v for v in in.split())

As for how to express this in a recipe, I can think of a few ways.
This first one is a variable flag. The base flag is "filter" and the
suffix indicates which filter is applied (in this case, the "prefix"
function). The value are the extra arguments passed to the function.

DEPENDS[filter:prefix] = "${MLPREFIX}"

Another option would be to encode the function and arguments in the
flag value, which might also allow filters to be chained (although, I
would hope that's a rare requirement!):

DEPENDS[filter] = "prefix ${MLPREFIX}"

Chaining might look something like:

DEPENDS[filter] = "prefix ${MLPREFIX}; suffix ${MY_SUFFIX}"

The great part about pure functions here is that the property is
transitive, so the entire result of chain with both operations (e.g.
suffix(prefix(${DEPENDS}))) can be cached if desired.
Those are good suggestions. Thinking out loud, I'm kind of drawn to
something which looks a bit more function like:

DEPENDS[filter] = "prefix_filter('${MLPREFIX}') suffix_filter('${MY_SUFFIX}')"
I was trying to avoid anything too complex to parse.... but I wonder if we can enlist the help of the python interpreter. If we have an appropriate delimiter, something like this might work:

def apply_filter(value, filter_expression):
class Val:
def __init__(self, value):
self.value = value
def prefix(self, prefix):
self.value = ' '.join(prefix + v for v in self.__value.split())
def suffix(self, suffix):
self.value = ' '.join(v + suffix for v in self.__value.split())

v = Val(value)
for e in filter_expression.split(';'):
e = e.strip()
if not e:
continue
eval(e, globals={}, locals={'v': v})
return v.value

Obviously, you might want something a little more dynamic for registering the actual filter functions into the class, but the important part is that the eval() is pretty restrictive in what filter functions are allowed to access or do.

This would allow the user to write a fairly natural expression like:

DEPENDS[filter] = "v.prefix('${MLPREFIX}'); v.suffix('${MY_SUFFIX}')

but, things like this also still work (since the code ignores empty expressions):

DEPEND[filter] += "; v.prefix('foo')"

The hardest part is finding a delimiter that isn't likely to show up in the actual variable values, since the string has to be split and duplicates ignored before it is parsed by the python code.
I summoned the Google god and I found this stackoverflow on how to
execute multiple lines instead of the one eval is expecting:
https://stackoverflow.com/a/39381428
https://docs.python.org/3/library/ast.html

If I didn't completely misunderstood what was suggested, that should
eliminate the need for splitting. Though... I don't know how all of this
works wrt basic security against code injection (e.g.:
DEPENDS[filter] += "import shutil; shutil.rmtree('/')")? I barely know
Python :)

Just to try to grasp what's suggested, are all filter flags going to be
resolved/executed at one specific point in time or will the filter of
each variable be called explicitely when needed?

If the former, my simple question is: aren't we just adding one more
ordering issue somehow? I'm pretty sure someone will want one day to
have this filter flag expanded/resolved at a different time that we
would set it to.

For the latter, seems error prone but at least we have the ability to
fix a missing or misplaced filter rather easily.

Hopefully I haven't completely misunderstood the suggestion.

I'm afraid that's as far as I can participate to the issue from my eagle
view on this as I'm lacking knowledge in Python and bitbake :/

Let me know if there's anything I can do,
Thanks,
Quentin