Date
1 - 10 of 10
Concerns about multilib bugs
Richard Purdie
I'm a bit worried about some of the mulitlib issues we've seen
recently. In particular we have a bug where license exclusion doesn't work properly in multilib builds, which is fairly serious. To dive into the details of what I found, as things stand today, the code sequence is: a) bb.event.RecipePreFinalise b) bb.data.expandKeys() c) base.bbclass anonymous python d) multilib.bbclass anonymous python The mutlilib code does half the work in a) but delays half to d) since it has to happen after python key expansion, there is no way to avoid that. The issue is that there is code in base.bbclass's anonymous python (step c) that assumes the package information in the datastore is correct, yet it isn't under after d). For example if you set: PACKAGES += "X-bar" LICENSE_${PN}-bar = "Y" (and PN is X) then base.bbclass can't see the LICENSE value by iterating PACKAGES. Yes, you could argue PACKAGES should match the keys used but that is hard to enforce and really shouldn't be necessary. Having debugged this I decided an "easy" fix would be to add a new event, bb.event.RecipePostKeyExpansion which happens after key expansion and before anonymous python. This updates the flow to: a) bb.event.RecipePreFinalise b) bb.data.expandKeys() c) multilib bb.event.RecipePostKeyExpansion d) base.bbclass anonymous python However things break :(. There is anonymous python that assumes it runs before multilib, e.g. mesa so we have to smatter in MLPREFIX references to make the code work again as after this change, all anonymous python runs after multilib so must use MLPREFIX in changes. Ok, fine, this at least gives us a consistent API. Sadly, things still break. For example: DEPENDS += "${@bb.utils.contains('GI_DATA_ENABLED', 'True', 'libxslt-native', '', d)}" since this is expanded by the multilib RecipePostKeyExpansion event, this happens before DISTRO_FEATURES is manipulated by base.bbclass and the value is lost. PACKAGES += "${@bb.utils.contains('PACKAGECONFIG', 'pulseaudio', 'alsa-plugins-pulseaudio-conf', '', d)}" which also suffered from effectively the same issue. There were also problems from: RECOMMENDS_${PN} += "${BOOST_PACKAGES}" Since BOOST_PACKAGES is unset until after anonymous python runs and the new multilib code tried to create packages like: "lib32-${BOOST_PACKAGES}" which translated to "lib32-lib32-boost-A lib32-boost-B". I'm struggling to decide how best we can fix much of this. One approach is to avoid expanding PACKAGES and DEPENDS, instead we can wrap them, saving the unexpanded value to some hidden variable and creating a function which dynamically expands them on the fly. There are patches to do this in master-next. Its rather ugly magic but does appear to work, I'm testing on the autobuilder atm just to see if it does fix issues. Its possible we could create a new bitbake API for this with variables having "filter" functions, called when the value is returned, I just worry about performance overheads. Does anyone have any other ideas or opinions on this? Cheers, Richard |
|
Joshua Watt
On Sun, Apr 19, 2020 at 12:04 PM Richard Purdie
<richard.purdie@...> wrote: Continuing the discussion from the OE TSC meeting, now that I've had a little time to digest the idea: I don't think the filter functions are necessarily a bad idea. I think the real problem is how to prevent them from becoming API maintenance and debugging problems, and as you stated, performance issues. I think there might be a few ways to address this, and most of it comes down to strictly defining the API. Firstly, I don't think we want to allow (at least at this point) completely arbitrary python functions to be called as a filter. Hopefully, we can figure out the few filter manipulations that are required and design the API such that those can be used without having to open it all the way up. Secondly, I think we should assume that any filter function is "pure" [1], meaning it only takes in the input variable value string, possibly some other manipulation arguments, and outputs the new variable value string. Notable, it should *not* take in the datastore object, since this is inherently global data that can change at any time. Hopefully, this will allow optimizations and prevent performance issues because bitbake can keep a cache of filter function inputs mapped to outputs. Since the function outputs *only* depend on the inputs, the filter can avoid being called multiple times if the input set has been seen before. I can dream up a few ways this might look. In this example, I'm defining a filter function that simply adds a defined prefix to every value in a space separated variable. First, the filter function itself: def filter_prefix(in, prefix): return ' '.join(prefix + v for v in in.split()) As for how to express this in a recipe, I can think of a few ways. This first one is a variable flag. The base flag is "filter" and the suffix indicates which filter is applied (in this case, the "prefix" function). The value are the extra arguments passed to the function. DEPENDS[filter:prefix] = "${MLPREFIX}" Another option would be to encode the function and arguments in the flag value, which might also allow filters to be chained (although, I would hope that's a rare requirement!): DEPENDS[filter] = "prefix ${MLPREFIX}" Chaining might look something like: DEPENDS[filter] = "prefix ${MLPREFIX}; suffix ${MY_SUFFIX}" The great part about pure functions here is that the property is transitive, so the entire result of chain with both operations (e.g. suffix(prefix(${DEPENDS}))) can be cached if desired. There's probably a few other ways to express this, but the idea is that it's more tightly controlled about what can be applied. [1]: https://en.wikipedia.org/wiki/Pure_function
|
|
On Sun, 19 Apr 2020 18:04:34 +0100
"Richard Purdie" <richard.purdie@...> wrote: I'm a bit worried about some of the mulitlib issues we've seenAt first glance this appears to be a mismatch between PACKAGES and the override used on LICENSE. It should either be: PACKAGES += "X-bar" LICENSE_X-bar = "Y" or PACKAGES += "${PN}-bar" LICENSE_${PN}-bar = "Y" but mixing those is always going to lead to pain somewhere in my experience. Of the two I think using `${PN}` is the correct one otherwise you'd get a clash between the package names when building with multilib (so you get `X-bar` and `lib32-X-bar` for example instead of two packages both named `X-bar`). We've already got warnings for `${PN}` used in places where `${BPN}` should really be used. Perhaps this is a case for something similar like raising a warning if PACKAGES contains the value of PN as a literal instead of using `${PN}`? Thanks, -- Paul Barker Konsulko Group |
|
Richard Purdie
On Wed, 2020-04-22 at 08:21 +0100, Paul Barker wrote:
On Sun, 19 Apr 2020 18:04:34 +0100Right, mixing them definitely gives the worst possible behaviours. Sadly we have recipes doing this today, with unpredictable side effects. We have always said we don't support this but it does mostly work and is hard to spot and even more painful to debug when it breaks. Of the two I think using `${PN}` is the correct one otherwise you'dOne solution is to use ${PN} or ${MLPREFIX} everywhere. The multilib class exists mainly as we didn't want to go through every recipe and mark things up that way, believing instead the code could automate it. We've already got warnings for `${PN}` used in places where `${BPN}`This isn't possible as we have a lot of recipes generating packages with names that don't contain PN... Cheers, Richard |
|
Richard Purdie
On Tue, 2020-04-21 at 21:51 -0500, Joshua Watt wrote:
On Sun, Apr 19, 2020 at 12:04 PM Richard PurdieI think that is a sensible idea. The good news is we know roughly what the current needs look like, the code is in lib/oe/classextend.py: def extend_name(self, name): if name.startswith("kernel-") or name == "virtual/kernel": return name if name.startswith("rtld"): return name if name.endswith("-crosssdk"): return name if name.endswith("-" + self.extname): name = name.replace("-" + self.extname, "") if name.startswith("virtual/"): subs = name.split("/", 1)[1] if not subs.startswith(self.extname): return "virtual/" + self.extname + "-" + subs return name if name.startswith("/") or (name.startswith("${") and name.endswith("}")): return name if not name.startswith(self.extname): return self.extname + "-" + name return name def map_depends(self, dep): if dep.endswith(("-native", "-native-runtime")) or ('nativesdk-' in dep) or ('cross-canadian' in dep) or ('-crosssdk-' in dep): return dep else: # Do not extend for that already have multilib prefix var = self.d.getVar("MULTILIB_VARIANTS") if var: var = var.split() for v in var: if dep.startswith(v): return dep return self.extend_name(dep) which apart from one getVar call, is all static string manipulation. Could be better, could be worse. Its surprisingly hard to manipulate a depends string (we also have to use the explode_deps function as the strings can have versions in). Secondly, I think we should assume that anyThat definitely looks like a good move and alleviates some of the concerns I have about it. I do wonder how we handle MULTILIB_VARIANTS in the above example. Perhaps some kind of cut down key/value pairs or allow parameters? I can dream up a few ways this might look. In this example, I'mThose are good suggestions. Thinking out loud, I'm kind of drawn to something which looks a bit more function like: DEPENDS[filter] = "prefix_filter('${MLPREFIX}') suffix_filter('${MY_SUFFIX}')" as it might be more obvious and searchable through grep? I don't like ";" as a delimiter as it doesn't work well with append/prepend of values. Its a tradeoff between the need for some parameters to be passed in and complexity. I do like the fact we can cache this approach, that will help a lot with performance. There's probably a few other ways to express this, but the idea isIts definitely the way to go... Cheers, Richard |
|
Chris Laplante
DEPENDS[filter] = "prefix_filter('${MLPREFIX}') suffix_filter('${MY_SUFFIX}')"Out of curiosity, do you anticipate that this will be used for any part of the implementation of multiconfig as well? Chris |
|
Richard Purdie
On Wed, 2020-04-22 at 13:45 +0000, chris.laplante@... wrote:
Not that I see at the moment. Only for native/nativesdk/multiconfigDEPENDS[filter] = "prefix_filter('${MLPREFIX}')Out of curiosity, do you anticipate that this will be used for any (anything using the class extension code). Cheers, Richard |
|
Joshua Watt
On 4/22/20 5:11 AM, Richard Purdie
wrote:
On Tue, 2020-04-21 at 21:51 -0500, Joshua Watt wrote:On Sun, Apr 19, 2020 at 12:04 PM Richard Purdie <richard.purdie@...> wrote: Continuing the discussion from the OE TSC meeting, now that I've had a little time to digest the idea: I don't think the filter functions are necessarily a bad idea. I think the real problem is how to prevent them from becoming API maintenance and debugging problems, and as you stated, performance issues. I think there might be a few ways to address this, and most of it comes down to strictly defining the API. Firstly, I don't think we want to allow (at least at this point) completely arbitrary python functions to be called as a filter. Hopefully, we can figure out the few filter manipulations that are required and design the API such that those can be used without having to open it all the way up.I think that is a sensible idea. The good news is we know roughly what the current needs look like, the code is in lib/oe/classextend.py: def extend_name(self, name): if name.startswith("kernel-") or name == "virtual/kernel": return name if name.startswith("rtld"): return name if name.endswith("-crosssdk"): return name if name.endswith("-" + self.extname): name = name.replace("-" + self.extname, "") if name.startswith("virtual/"): subs = name.split("/", 1)[1] if not subs.startswith(self.extname): return "virtual/" + self.extname + "-" + subs return name if name.startswith("/") or (name.startswith("${") and name.endswith("}")): return name if not name.startswith(self.extname): return self.extname + "-" + name return name def map_depends(self, dep): if dep.endswith(("-native", "-native-runtime")) or ('nativesdk-' in dep) or ('cross-canadian' in dep) or ('-crosssdk-' in dep): return dep else: # Do not extend for that already have multilib prefix var = self.d.getVar("MULTILIB_VARIANTS") if var: var = var.split() for v in var: if dep.startswith(v): return dep return self.extend_name(dep) which apart from one getVar call, is all static string manipulation. Could be better, could be worse. Its surprisingly hard to manipulate a depends string (we also have to use the explode_deps function as the strings can have versions in).Secondly, I think we should assume that any filter function is "pure" [1], meaning it only takes in the input variable value string, possibly some other manipulation arguments, and outputs the new variable value string. Notable, it should *not* take in the datastore object, since this is inherently global data that can change at any time. Hopefully, this will allow optimizations and prevent performance issues because bitbake can keep a cache of filter function inputs mapped to outputs. Since the function outputs *only* depend on the inputs, the filter can avoid being called multiple times if the input set has been seen before.That definitely looks like a good move and alleviates some of the concerns I have about it. I do wonder how we handle MULTILIB_VARIANTS in the above example. Perhaps some kind of cut down key/value pairs or allow parameters?I can dream up a few ways this might look. In this example, I'm defining a filter function that simply adds a defined prefix to every value in a space separated variable. First, the filter function itself: def filter_prefix(in, prefix): return ' '.join(prefix + v for v in in.split()) As for how to express this in a recipe, I can think of a few ways. This first one is a variable flag. The base flag is "filter" and the suffix indicates which filter is applied (in this case, the "prefix" function). The value are the extra arguments passed to the function. DEPENDS[filter:prefix] = "${MLPREFIX}" Another option would be to encode the function and arguments in the flag value, which might also allow filters to be chained (although, I would hope that's a rare requirement!): DEPENDS[filter] = "prefix ${MLPREFIX}" Chaining might look something like: DEPENDS[filter] = "prefix ${MLPREFIX}; suffix ${MY_SUFFIX}" The great part about pure functions here is that the property is transitive, so the entire result of chain with both operations (e.g. suffix(prefix(${DEPENDS}))) can be cached if desired.Those are good suggestions. Thinking out loud, I'm kind of drawn to something which looks a bit more function like: DEPENDS[filter] = "prefix_filter('${MLPREFIX}') suffix_filter('${MY_SUFFIX}')" I was trying to avoid anything too complex to parse.... but I wonder if we can enlist the help of the python interpreter. If we have an appropriate delimiter, something like this might work: def apply_filter(value, filter_expression): class Val: def __init__(self, value): self.value = value def prefix(self, prefix): self.value = ' '.join(prefix + v for v in self.__value.split()) def suffix(self, suffix): self.value = ' '.join(v + suffix for v in self.__value.split()) v = Val(value) for e in filter_expression.split(';'): e = e.strip() if not e: continue eval(e, globals={}, locals={'v': v}) return v.value Obviously, you might want something a little more dynamic for registering the actual filter functions into the class, but the important part is that the eval() is pretty restrictive in what filter functions are allowed to access or do. This would allow the user to write a fairly natural expression like: DEPENDS[filter] = "v.prefix('${MLPREFIX}'); v.suffix('${MY_SUFFIX}') but, things like this also still work (since the code ignores empty expressions): DEPEND[filter] += "; v.prefix('foo')" The hardest part is finding a delimiter that isn't likely to show up in the actual variable values, since the string has to be split and duplicates ignored before it is parsed by the python code. as it might be more obvious and searchable through grep? I don't like ";" as a delimiter as it doesn't work well with append/prepend of values. Its a tradeoff between the need for some parameters to be passed in and complexity. I do like the fact we can cache this approach, that will help a lot with performance.There's probably a few other ways to express this, but the idea is that it's more tightly controlled about what can be applied. [1]: https://en.wikipedia.org/wiki/Pure_functionIts definitely the way to go... Cheers, Richard |
|
Quentin Schulz <quentin.schulz@...>
Hi all,
Giving my 2¢. On Wed, Apr 22, 2020 at 09:10:42AM +0100, Richard Purdie wrote: On Wed, 2020-04-22 at 08:21 +0100, Paul Barker wrote:Indeed, and the thing is... it does not fix the original issue, so thirdOn Sun, 19 Apr 2020 18:04:34 +0100One solution is to use ${PN} or ${MLPREFIX} everywhere. The multilib party layers not doing that will still have the same bug which is hard to detect. Since we can't unforce it, that seems to not be the solution. Also, this looks like a dangerous whack-a-mole. We know it breaks forWe've already got warnings for `${PN}` used in places where `${BPN}`This isn't possible as we have a lot of recipes generating packages LICENSE, does it break for something else we haven't found yet, are we going to forget on adding a warning once a new feature could have the same bogus behavior? It's like silencing the actual issue and giving us a false sense of safety. Thanks for looking into the issue and resuming much better the issue than in the book I sent as a bug report :) Quentin |
|
Quentin Schulz <quentin.schulz@...>
Hi all,
I still had a couple of ¢ to give so here they are. On Wed, Apr 22, 2020 at 09:30:23AM -0500, Joshua Watt wrote: I summoned the Google god and I found this stackoverflow on how to execute multiple lines instead of the one eval is expecting: https://stackoverflow.com/a/39381428 https://docs.python.org/3/library/ast.html If I didn't completely misunderstood what was suggested, that should eliminate the need for splitting. Though... I don't know how all of this works wrt basic security against code injection (e.g.: DEPENDS[filter] += "import shutil; shutil.rmtree('/')")? I barely know Python :) Just to try to grasp what's suggested, are all filter flags going to be resolved/executed at one specific point in time or will the filter of each variable be called explicitely when needed? If the former, my simple question is: aren't we just adding one more ordering issue somehow? I'm pretty sure someone will want one day to have this filter flag expanded/resolved at a different time that we would set it to. For the latter, seems error prone but at least we have the ability to fix a missing or misplaced filter rather easily. Hopefully I haven't completely misunderstood the suggestion. I'm afraid that's as far as I can participate to the issue from my eagle view on this as I'm lacking knowledge in Python and bitbake :/ Let me know if there's anything I can do, Thanks, Quentin |
|