On Wed, 2022-06-29 at 16:10 -0400, Aryaman Gupta wrote:
Stop the scheduler from starting new tasks if the
current cpu or io pressure is above a certain threshold, specified
through the "BB_MAX_{CPU|IO}_SOME_PRESSURE" variables in conf/local.conf.
If the thresholds aren't specified, the default values are 100 for both
CPU and IO, which will have no impact on build times.
Arbitary lower limit of 1.0 results in a fatal error to avoid extremely
long builds. If the percentage limits are higher than 100, then the default
values are used and warnings are issued to inform users that the
specified limit is out of bounds.
Signed-off-by: Aryaman Gupta <aryaman.gupta@...>
Signed-off-by: Randy Macleod <randy.macleod@...>
---
bitbake/lib/bb/runqueue.py | 39 ++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
This looks like a good start, thanks. There are a few things which will
need cleaning up in here as this is pretty performance sensitive code
(try a "time bitbake world -n" to see what I mean).
Firstly, it is a bitbake patch so should really go to the bitbake
mailing list, not OE-Core.
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 1e47fe70ef..9667acc11c 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -159,6 +159,27 @@ class RunQueueScheduler(object):
self.buildable.append(tid)
self.rev_prio_map = None
+ # Some hosts like openSUSE have readable /proc/pressure files
+ # but throw errors when these files are opened.
+ try:
+ subprocess.check_output(["cat", "/proc/pressure/cpu", "/proc/pressure/io"], \
+ universal_newlines=True, stderr=subprocess.DEVNULL)
+ self.readable_pressure_files = True
except:
+ if self.rq.max_cpu_pressure!=100 or self.rq.max_io_pressure!=100:
+ bb.warn("The /proc/pressure files can't be read. Continuing build without monitoring pressure")
+ self.readable_pressure_files = False
+
+ def exceeds_max_pressure(self):
+ if self.readable_pressure_files:
+ # extract avg10 from /proc/pressure/{cpu|io}
+ curr_pressure_sample = subprocess.check_output(["cat", "/proc/pressure/cpu", "/proc/pressure/io"], \
+ universal_newlines=True, stderr=subprocess.DEVNULL)
+ curr_cpu_pressure = curr_pressure_sample.split('\n')[0].split()[1].split("=")[1]
+ curr_io_pressure = curr_pressure_sample.split('\n')[2].split()[1].split("=")[1]
This is horrible, you're adding in a fork() call for every pass through
the scheduler code. You can just open() and read the file instead which
will have *much* lower overhead.
Even then, I'm not particularly thrilled to have this level of overhead
in this codepath, in some ways I'd prefer to rate limit how often we're
looking up this value rather than once per pass through the scheduler
path. I'm curious what the timings say.
+
+ return float(curr_cpu_pressure) > self.rq.max_cpu_pressure or float(curr_io_pressure) > self.rq.max_io_pressure
+ return False
def next_buildable_task(self):
"""
@@ -171,6 +192,8 @@ class RunQueueScheduler(object):
buildable.intersection_update(self.rq.tasks_covered | self.rq.tasks_notcovered)
if not buildable:
return None
+ if self.exceeds_max_pressure():
+ return None
# Filter out tasks that have a max number of threads that have been exceeded
skip_buildable = {}
@@ -1699,6 +1722,8 @@ class RunQueueExecute:
self.number_tasks = int(self.cfgData.getVar("BB_NUMBER_THREADS") or 1)
self.scheduler = self.cfgData.getVar("BB_SCHEDULER") or "speed"
+ self.max_cpu_pressure = float(self.cfgData.getVar("BB_MAX_CPU_SOME_PRESSURE") or 100.0)
+ self.max_io_pressure = float(self.cfgData.getVar("BB_MAX_IO_SOME_PRESSURE") or 100.0)
I did wonder if this should be BB_PRESSURE_MAX_SOME_IO as the order of
the information kinds of seems backwards to me. That could just be me
though! :)
Cheers,
Richard