Re: Concerns about multilib bugs

Joshua Watt

On 4/22/20 5:11 AM, Richard Purdie wrote:
On Tue, 2020-04-21 at 21:51 -0500, Joshua Watt wrote:
On Sun, Apr 19, 2020 at 12:04 PM Richard Purdie
<richard.purdie@...> wrote:
Continuing the discussion from the OE TSC meeting, now that I've had
a little time to digest the idea: I don't think the filter functions
are necessarily a bad idea. I think the real problem is how to
prevent them from becoming API maintenance and debugging problems,
and as you stated, performance issues. I think there might be a few
ways to address this, and most of it comes down to strictly defining
the API.
Firstly, I don't think we want to allow (at least at this point)
completely arbitrary python functions to be called as a filter.
Hopefully, we can figure out the few filter manipulations that are
required and design the API such that those can be used without
having to open it all the way up.
I think that is a sensible idea.

The good news is we know roughly what the current needs look like, the
code is in lib/oe/

    def extend_name(self, name):
        if name.startswith("kernel-") or name == "virtual/kernel":
            return name
        if name.startswith("rtld"):
            return name
        if name.endswith("-crosssdk"):
            return name
        if name.endswith("-" + self.extname):
            name = name.replace("-" + self.extname, "")
        if name.startswith("virtual/"):
            subs = name.split("/", 1)[1]
            if not subs.startswith(self.extname):
                return "virtual/" + self.extname + "-" + subs
            return name
        if name.startswith("/") or (name.startswith("${") and name.endswith("}")):
            return name
        if not name.startswith(self.extname):
            return self.extname + "-" + name
        return name

    def map_depends(self, dep):
        if dep.endswith(("-native", "-native-runtime")) or ('nativesdk-' in dep) or ('cross-canadian' in dep) or ('-crosssdk-' in dep):
            return dep
            # Do not extend for that already have multilib prefix
            var = self.d.getVar("MULTILIB_VARIANTS")
            if var:
                var = var.split()
                for v in var:
                    if dep.startswith(v):
                        return dep
            return self.extend_name(dep)

which apart from one getVar call, is all static string manipulation.
Could be better, could be worse. Its surprisingly hard to manipulate a
depends string (we also have to use the explode_deps function as the
strings can have versions in).

 Secondly, I think we should assume that any
filter function is "pure" [1], meaning it only takes in the input
variable value string, possibly some other manipulation arguments,
and outputs the new variable value string. Notable, it should *not*
take in the datastore object, since this is inherently global data
that can change at any time. Hopefully, this will allow optimizations
and prevent performance issues because bitbake can keep a cache of
filter function inputs mapped to outputs. Since the function outputs
*only* depend on the inputs, the filter can avoid being called
multiple times if the input set has been seen before.
That definitely looks like a good move and alleviates some of the
concerns I have about it.

I do wonder how we handle MULTILIB_VARIANTS in the above example.
Perhaps some kind of cut down key/value pairs or allow parameters?

I can dream up a few ways this might look. In this example, I'm
defining a filter function that simply adds a defined prefix to every
value in a space separated variable.

First, the filter function itself:

 def filter_prefix(in, prefix):
     return ' '.join(prefix + v for v in in.split())

As for how to express this in a recipe, I can think of a few ways.
This first one is a variable flag. The base flag is "filter" and the
suffix indicates which filter is applied (in this case, the "prefix"
function). The value are the extra arguments passed to the function.

 DEPENDS[filter:prefix] = "${MLPREFIX}"

Another option would be to encode the function and arguments in the
flag value, which might also allow filters to be chained (although, I
would hope that's a rare requirement!):

 DEPENDS[filter] = "prefix ${MLPREFIX}"

Chaining might look something like:

 DEPENDS[filter] = "prefix ${MLPREFIX}; suffix ${MY_SUFFIX}"

The great part about pure functions here is that the property is
transitive, so the entire result of chain with both operations (e.g.
suffix(prefix(${DEPENDS}))) can be cached if desired.
Those are good suggestions. Thinking out loud, I'm kind of drawn to
something which looks a bit more function like:

DEPENDS[filter] = "prefix_filter('${MLPREFIX}') suffix_filter('${MY_SUFFIX}')"
I was trying to avoid anything too complex to parse.... but I wonder if we can enlist the help of the python interpreter. If we have an appropriate delimiter, something like this might work:

 def apply_filter(value, filter_expression):
     class Val:
         def __init__(self, value):
             self.value = value
         def prefix(self, prefix):
             self.value = ' '.join(prefix + v for v in self.__value.split())
         def suffix(self, suffix):
             self.value = ' '.join(v + suffix for v in self.__value.split())

     v = Val(value)
     for e in filter_expression.split(';'):
         e = e.strip()
         if not e:
         eval(e, globals={}, locals={'v': v})
     return v.value

Obviously, you might want something a little more dynamic for registering the actual filter functions into the class, but the important part is that the eval() is pretty restrictive in what filter functions are allowed to access or do.

This would allow the user to write a fairly natural expression like:

 DEPENDS[filter] = "v.prefix('${MLPREFIX}'); v.suffix('${MY_SUFFIX}')

but, things like this also still work (since the code ignores empty expressions):

 DEPEND[filter] += "; v.prefix('foo')"

The hardest part is finding a delimiter that isn't likely to show up in the actual variable values, since the string has to be split and duplicates ignored before it is parsed by the python code.

as it might be more obvious and searchable through grep? I don't like
";" as a delimiter as it doesn't work well with append/prepend of
values. Its a tradeoff between the need for some parameters to be
passed in and complexity.

I do like the fact we can cache this approach, that will help a lot
with performance.

There's probably a few other ways to express this, but the idea is
that it's more tightly controlled about what can be applied.

Its definitely the way to go...



Join to automatically receive all group messages.