Date   

Re: Interactive password prompts and the git fetcher

Khem Raj
 

On Mon, Sep 3, 2018 at 9:18 PM Paul Eggleton
<paul.eggleton@...> wrote:

Hi folks,

In the layer index / RRS code I have found that if it tries to use bitbake's
git fetcher to access a git repository on github that has been renamed or gone
private via http/https, I get prompted for a password interactively,
presumably on the assumption that maybe if I authenticate I might be able to
see the repo. Here's an example git command that will trigger it (trimmed from
the actual command issued by the fetcher):

git ls-remote http://github.com/symless/synergy.git

This can be disabled by setting the environment variables GIT_ASKPASS to an
empty string and GIT_TERMINAL_PROMPT to 0, which allows the fetch command to
fail immediately. I could conceivably do this in the layer index / RRS
scripts, but it occurred to me that perhaps this should be disabled in
bitbake's git fetcher itself. We need to decide whether or not interactive
password prompts should be allowed during a fetch operation (I'm guessing not,
although I have a feeling this might be a "spacebar heating" situation for
some).

Thoughts? In particular, does anyone object to disabling such interactive
prompts during fetching?
+1


Re: Interactive password prompts and the git fetcher

Koen Kooi
 

Op 4 sep. 2018, om 06:18 heeft Paul Eggleton <paul.eggleton@...> het volgende geschreven:

Hi folks,

In the layer index / RRS code I have found that if it tries to use bitbake's
git fetcher to access a git repository on github that has been renamed or gone
private via http/https, I get prompted for a password interactively,
presumably on the assumption that maybe if I authenticate I might be able to
see the repo. Here's an example git command that will trigger it (trimmed from
the actual command issued by the fetcher):

git ls-remote http://github.com/symless/synergy.git

This can be disabled by setting the environment variables GIT_ASKPASS to an
empty string and GIT_TERMINAL_PROMPT to 0, which allows the fetch command to
fail immediately. I could conceivably do this in the layer index / RRS
scripts, but it occurred to me that perhaps this should be disabled in
bitbake's git fetcher itself. We need to decide whether or not interactive
password prompts should be allowed during a fetch operation (I'm guessing not,
although I have a feeling this might be a "spacebar heating" situation for
some).

Thoughts? In particular, does anyone object to disabling such interactive
prompts during fetching?
Having run into similar issues a few times: let’s disable interactive prompts during fetching.

regards,

Koen


Interactive password prompts and the git fetcher

Paul Eggleton <paul.eggleton@...>
 

Hi folks,

In the layer index / RRS code I have found that if it tries to use bitbake's
git fetcher to access a git repository on github that has been renamed or gone
private via http/https, I get prompted for a password interactively,
presumably on the assumption that maybe if I authenticate I might be able to
see the repo. Here's an example git command that will trigger it (trimmed from
the actual command issued by the fetcher):

git ls-remote http://github.com/symless/synergy.git

This can be disabled by setting the environment variables GIT_ASKPASS to an
empty string and GIT_TERMINAL_PROMPT to 0, which allows the fetch command to
fail immediately. I could conceivably do this in the layer index / RRS
scripts, but it occurred to me that perhaps this should be disabled in
bitbake's git fetcher itself. We need to decide whether or not interactive
password prompts should be allowed during a fetch operation (I'm guessing not,
although I have a feeling this might be a "spacebar heating" situation for
some).

Thoughts? In particular, does anyone object to disabling such interactive
prompts during fetching?

Cheers,
Paul

--
Paul Eggleton
Intel Open Source Technology Centre


Re: Trusted/secure/etc layers

Rich Persaud
 

On Aug 7, 2018, at 22:56, Trevor Woerner <twoerner@...> wrote:

Hi Rich,

On Mon, Aug 6, 2018 at 3:02 PM, Rich Persaud <persaur@...> wrote:
Several "boot integrity" videos have been posted, which cover SRTM and DRTM:

Thanks for keeping us updated on the progress. I watched Bruce's video last week; his talk and the quality of the video was great. Thanks for making these public!!

At today's Technical Meeting, someone mentioned that Google recently announced some sort of security framework? Does anyone have a project name or links?

Google Cloud Shielded VMs, see #3 and #4 below.


Rich

---

Summary:  services can authorize device/VM connection if the managed device/VM can prove through certificates and/or runtime cryptographic measurements that it is executing a known-good stack of hardware, firmware and software.  This can be done on a resource basis, e.g. access to a sensitive spreadsheet or document, within a "zero trust" or "software defined perimeter" network architecture.


2. Cisco/Duo:  https://duo.com/blog/beyondcorp-for-the-rest-of-us

Google Cloud:






Microsoft Azure-Windows:

7. Hardening with Hardware:  https://youtube.com/watch?v=8V0wcqS22vc


9. Azure-Windows runtime attestation ("leveraging device and user trust claims to gate access to organizational resources")https://cloudblogs.microsoft.com/microsoftsecure/2018/06/14/building-zero-trust-networks-with-microsoft-365/

10. Azure Sphere for IoT (hardware RoT, 10 years of Microsoft Linux security updates): https://azure.microsoft.com/en-us/blog/introducing-microsoft-azure-sphere-secure-and-power-the-intelligent-edge/

11. MS OpenCompute Cerberus: https://youtube.com/watch?v=wCMplDQLsfw



(small advertisement:  OpenXT uses OpenEmbedded in an open-source platform for device assurance and runtime attestation, so that vendors and customers can make geographic and industry-specific choices for device keys, certificates and trust hierarchies.  Some features, e.g. measured boot, are being unbundled so they can be used by other projects.  We are working towards an upstream-first, CI workflow with OE meta-virtualization and Xen)


Re: Trusted/secure/etc layers

Trevor Woerner
 

Hi Rich,

On Mon, Aug 6, 2018 at 3:02 PM, Rich Persaud <persaur@...> wrote:
Several "boot integrity" videos have been posted, which cover SRTM and DRTM:

Thanks for keeping us updated on the progress. I watched Bruce's video last week; his talk and the quality of the video was great. Thanks for making these public!!

At today's Technical Meeting, someone mentioned that Google recently announced some sort of security framework? Does anyone have a project name or links?

Thanks.


Re: Trusted/secure/etc layers

Rich Persaud
 

On May 3, 2018, at 23:46, Rich Persaud <persaur@...> wrote:

On May 2, 2018, at 18:44, Tom Rini <trini@...> wrote:

On Wed, May 02, 2018 at 06:26:31AM +0200, Patrick Ohly wrote:
Trevor Woerner <twoerner@...> writes:
Philip Tricca (for Intel) has been leading a lot of TSS work as well, he also
maintains meta-measured https://github.com/flihp/meta-measured for OE recipes.

Note that he is very open to the idea of moving those recipes elsewhere
and deprecating meta-measured. The recipes themselves aren't getting
updated as often as they used to be, too.

Trying to break this down into small tasks, there's a number of cases
for making use of GRUB_BUILDIN in the context of security (Intel TXT, UEFI
secure boot, I bet the AMD equivalent of TXT).  So we have the generic
hook we need (good!).

Intel TXT (DRTM) [1]  and UEFI secure boot (SRTM) each offer specific security benefits.  They can also be combined [2].

To achieve security benefits with DRTM, there's a bit more involved than grub. E.g. what are you measuring?  If you want to measure rootfs, it needs to be read-only.  If you OTA update and change the measurement, device fails to boot new version.  We've solved some of these issues in one configuration of OE with Xen [3].

DRTM depends on correctly implemented system BIOS.  After we started testing DRTM on endpoints in early 2010s, it took a few years and bricked devices before OEMs worked through the BIOS issues.  DRTM BIOS on servers is likely earlier in the maturity cycle.  Two devices, same CPU generation (BSP), can have different OEM BIOS quirks.  Windows 10 SystemGuard DRTM [4] may improve UEFI and OEM BIOS validation.

There's a conference [2] on these topics in May. Philip Tricca and Bruce Ashfield will be presenting, among other OE contributors.

Several "boot integrity" videos have been posted, which cover SRTM and DRTM:

Rich


Re: Platform Security Summit 2018

Rich Persaud
 

Many thanks to Bruce Ashfield for his comprehensive overview of the meta-virtualization layer!  The presentation has been posted:


Rich

On May 2, 2018, at 14:29, Rich Persaud <persaur@...> wrote:

PSEC 2018 brings together security researchers and developers from the open-source ecosystems of OpenEmbedded, Xen Project and OpenXT, including presentations by OE contributors Intel, Wind River and Star Lab.

With a focus on hardware-based security and commercially extensible open source, this 2-day, single track event is for hardware and firmware engineers, VMM and OS developers, security architects, integrators and senior technical staff.  

Presentation abstracts, technical references and registration details are available at https://platformsecuritysummit.com

Rich


Re: A journey into parallelisation of oe-selftest

Richard Purdie
 

On Sun, 2018-07-15 at 12:56 +0100, Richard Purdie wrote:
I've since discovered that one of the systemd_boot tests took around
5 hours and the rest of the tests finished within about an hour (the
sstate cache was hot). I strongly suspect this is due to its use "-c
cleansstate" which on the autobuilder with its huge sstate store on
NFS is very very slow. There is a patch pending to remove that as
tests should never be deleting from sstate anyway.
As if to prove my point we just happened to see this exact failure:

https://autobuilder.yocto.io/builders/build-appliance/builds/1118

which is caused by a oe-selftest for sumo that was running in parallel.
To speed up builds and build safety I'll ensure that fix gets
backported.

Cheers,

Richard


A journey into parallelisation of oe-selftest

Richard Purdie
 

I thought I'd write down my findings as I looked into parallelising oe-
selftest, both so there is a record of them somewhere I can point
people at if they have questions and so that others may be able to
build from this in future.

There are basically three ways we could parallelise it:

a) High level wrapper which lists the modules, then runs each module as
"oe-selftest -r X" in its own process. Leo did this using GNU parallel.
On the plus side its simple, on the downside the logging and results
reporting is poor.

http://lists.openembedded.org/pipermail/openembedded-core/2017-December/145743.html

b) Use threads. Anibal tried this and whilst it works to a point and we
have some code in oeqa for it, it was starting to need special handling
in many places. It was found to be problematic for some tests which had
to run in the main thread and doesn't work with things like tinfoil or
potentially memory resident bitbake. The more we went down that rabbit
hole, the more I worried it was getting too complex and wasn't scaling
well.

http://git.yoctoproject.org/cgit.cgi/poky-contrib/log/?h=alimon/oe_selftest_threaded

c) Use multiple processes but connect them to a central unittest
instance. There is prior art for this "upstream" in python-testtools
and its conncurrenttestsuite. This allows an existing unittest setup to
be retrofitted with multiple process execution. Sounds great!

Others have tried a/b and we'd found issues so I decided to look at c
which we hadn't tried.

One key piece of parallelising this is that the different tests need to
run in different build directories. This means we need a mechanism to
copy the default build config into a new build directory. There are
some implementation decisions to be made here. We could have a
directory for every test however there is a lot of setup that goes into
these directories with extraction of things from sstate. In summary
I've chosen to have a build directory for each process and each shares
a common DL_DIR and SSTATE_DIR. The build directory is reused for each
test in a given process.

One factor in making this decision was realising how hard the machine
gets hammered when we initially launch the tests and 15 build
directories all parse the metadata in parallel. Parsing hits 100% CPU
usage with one build directory so you can imagine what 15 builds does.
We can mitigate the problem slightly by copying in the build/cache
directory which contains the codeparser cache. Not all the cache will
be reused due to the path differences but a large chunk can. The main
cache directory is pointless to copy as the path/config differences
mean it would reparse anyway. Having a builddir per process at least
means the cache can be reused there rather than having a reparse per
test.

Looking more closely at concurrenytestsuite (cts), it has advantages in
that it wraps around standard python unittest but this also brings some
drawbacks as some of the games it plays to ensure that compatibility is
rather ugly. There are alternative "streaming" variants in testtools
but those mean moving off standard unittest so I've avoided them and
decided to see how far we can get with cts.

One big win implementation detail wise is that when a parallisation
option isn't passed, the oeqa code can behave as it does today and
doesn't need to import testtools (or subunit which it depends upon).
This means we can have these as optional OE dependencies which you only
need if you specify the -j option. I was able to implement most of what
we need in concurrencytest.py without many changes to the core. Some of
that "retrofitting" is ugly and some needs to move into the core as a
first class citizen but its good for testing and a sign the abstraction
is good.

The way cts works, it has to create a dummy "startTest" event later
once the result is available. It stores its own internal timestamps
about the test duration and accessing them is basically a pain. Its
useful to mark up the test results with start/end times so that we can
debug which tests are taking the most time.

Initially when I implemented this, I was shocked to see oe-selftest
still taking several hours despite the signs being good that it was
running in parallel.

I've since discovered that one of the systemd_boot tests took around 5
hours and the rest of the tests finished within about an hour (the
sstate cache was hot). I strongly suspect this is due to its use "-c
cleansstate" which on the autobuilder with its huge sstate store on NFS
is very very slow. There is a patch pending to remove that as tests
should never be deleting from sstate anyway.

The "scheduling" algorithm used by the tests is also not ideal, it just
puts things into N buckets, then passes the list of tests to the
subprocess. This means if one process burns through its list quickly,
it will just finish. The argument is that the test runtimes should
average out in general and to an extent they do.

I ended up choosing to schedule things based on groups at the test
'class' level (tests are of the form module.class.testname) since this
let us split things up yet keep similar things in the same build
directory for efficiency. This will likely mean we need to tweak some
of the test groupings, in particular I have a patch which splits
devtool up since it was taking by far the longest of the test modules
to complete. Wic will likely need splitting into two but that meant
fixing its hardcoded racy reference to /var/tmp/wic-selftest.

I found some tests are making changes to metadata as part of the tests.
For this reason meta-selftest is copied into each build directory. I
believe there may still be issues in devtool in particular with changes
to the meta/ directory being temporarily made.

There are a raft of other tweaks I've ended up making such as forcing
all the tests to run in "buffer" mode to capture stdout/stderr and only
show that upon failure and fixes for various races these changes are
highlighting.

I also found subunit assumes sys.stdout.buffer exists which it doesn't
for unittest's buffer mode which uses io.StringIO, I ended up hacking
around that.

My main issues now are how to neatly log to the console, the timings
output I've developed for debugging is noisy but also very effective
for debugging and I'm considering leaving it on all the time.

An exmaple 57 minute selftest run with a hot sstate cache with the
confusing output:

https://autobuilder.yocto.io/builders/nightly-oe-selftest/builds/1189/steps/Running%20oe-selftest/logs/stdio

In builds in parallel with other things, it seems this time rises to
around 2 hours which is still much better than it used to be. In that
output you can see entries like "X:Y/Z (time taken) (testname)" where X
is the process number (it had 15 processes) and Y/Z is test number Y of
a total of Z in that process.

In summary, I think this approach is going to be the way to make oe-
selftest work better on our infrastructure and I'll propose a patch
series accordingly. As part of that I'll propose we remove the existing
threading code as this new approach should replace it.

Cheers,

Richard


Re: [yocto] OEDEM Sunday, 21 October 2018 in Edinburgh (before ELCE)

Armin Kuster
 

On 06/28/2018 01:46 PM, Philip Balister wrote:
OpenEmbedded is holding a developer meting in Edinburgh before ELCE.
Anyone with an interest in OpenEmbedded development is welcome to
attend. The core developers enjoy hearing about the needs and problems
of the larger community.

At this we are looking for meeting space and it would be a huge help to
have a better idea of the headcount. Please add yourself if you intend
to attend the meeting. (If you drop out later, go ahead and remove yourself)

The RSVP web page is:

https://www.openembedded.org/wiki/OEDEM_2018

Searching the wiki for OEDEM should get you info from topics discussed
at past meetings.

If you need to create an account, approval might be delayed a bit as I
am on vacation ...
or send me your RSVP and I can add you to the wiki page.

regards,
Armin

Philip


OEDEM Sunday, 21 October 2018 in Edinburgh (before ELCE)

Philip Balister
 

OpenEmbedded is holding a developer meting in Edinburgh before ELCE.
Anyone with an interest in OpenEmbedded development is welcome to
attend. The core developers enjoy hearing about the needs and problems
of the larger community.

At this we are looking for meeting space and it would be a huge help to
have a better idea of the headcount. Please add yourself if you intend
to attend the meeting. (If you drop out later, go ahead and remove yourself)

The RSVP web page is:

https://www.openembedded.org/wiki/OEDEM_2018

Searching the wiki for OEDEM should get you info from topics discussed
at past meetings.

If you need to create an account, approval might be delayed a bit as I
am on vacation ...

Philip


Re: sstate equivalency

Joshua Watt
 



On Sun, Jun 10, 2018, 04:38 Martin Jansa <martin.jansa@...> wrote:
This description sounds like there would be one global equivalency server, but I would expect this to be implemented in similar way how bitbake PRserv is currently used - so that each project or user can define which equivalency server to use and where to report the hashes.

I wasn't sure, so I approached it from the idea of there being a public server, perhaps populated by the nightly builds. I think it is probably more useful if there is a public one (more like the sstate cache than a PRserv). I don't think it would be hard to do multiple levels of equivalence servers so that an internal one can be maintained in preference to the public one (sort of like how you can have "levels" of sstate cache), or there is only one but they do deferrals to a higher server on a miss (like DNS).


I'm also surprised that there is no indication of the recipe, architecture, version for these hashes, just one hash for everything which is good for anonymity of the submitted hashes, but it also means that if I have very simple component like systemd-serialgetty:
sstate-cache/dd/sstate:systemd-serialgetty:qemux86-oe-linux-musl:1.0:r5:qemux86:3:ddf17a1eff3be1ff6a7ff393bbbf289b_package.tgz
then the outhash can be the same as what qemux86copy MACHINE builds, but how is bitbake supposed to find that instead of building ddf17a1eff3be1ff6a7ff393bbbf289b it can re-use sstate equivalent file:
sstate-cache/ee/sstate:systemd-serialgetty:qemux86copy-oe-linux-musl:1.0:r5:qemux86copy:3:eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee_package.tgz
?

From one point of view this is great that we can re-use already built sstate even if it was built for completely different MACHINE and possibly even different recipe and version as long as the outhash is identical, but it blows up really badly once you get some conflicts in outhash calculations (and it really depends on what we'll include in outhash, first we try to ignore timestamp, then users, then we might be interested only in the ABI for given library).

I'm not sure if you are referring to the input hashes or the output hashes (from the servers perspective), so I'll address both.

I went back and forth a lot on the input hashes. It isn't hard to include all that information and still be anonymous, you just hash all of the relevant information together (MACHINE, BB_TASKHASH, etc.) and send that as the request hash to the server. I had this in a previous draft, but I think I decided that BB_TASKHASH should include all the relevant variables anyway, so it was unnecessary (and only made it more complicated). I don't know that I have a strong opinion either way, so if there is a good reason to go the other way, I'm fine with that.

On the output hashes: It is a little more nuanced than that. For the sake of discussion, lets say that we calculate "TASKHASH_A" as the hash for one task named "Task A" (perhaps ddf17a1eff3be1ff6a7ff393bbbf289b), and "TASKHASH_B" for another task named "Task B" (perhaps eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee). Lets say these two tasks are equivalent, but no one has discovered that yet. Now, an equivalence aware executor builds Task A, and then calculates its output hash as OUTHASH_C. When it goes to write the sstate object, it doesn't write it out using TASKHASH_A, but instead writes it out using OUTHASH_C. The executor then reports that TASKHASH_A is equivalent to OUTHASH_C and moves on. A later executor now needs to build Task B. At this point, it isn't applicable that TASKHASH_A is equivalent to OUTHASH_C, because it is unknown that TASKHASH_B is equivalent to OUTHASH_C (that can't be known until Task B is built). The task executes, the output hash is calculated as OUTHASH_C, the sstate object is written (if it doesn't already exist!), and the equivalence of TASKHASH_B to OUTHASH_C is reported.

Now that this general procedure is outlined, let us consider 2 cases:

Case A) Task A and Task B are two independent tasks. That is, lets say Task A is the task for MACHINE qemu86 and Task B is for the MACHINE qemu86copy, as you described above. In this case, the two tasks will never "converge", that is, it will never be able to use the sstate object for Task A in place of the sstate object for Task B but... *that doesn't matter*. The object of equivalence is to reduce rebuilds, not necessarily to reduce the sstate cache size. It is impossible to have known that Task A and Task B are equivalent without someone having actually *built* both of them, so in even though they are equivalent, *both objects should have been written to the sstate cache*. A later executor will simply restore either one from sstate (based on the equivalent server reported OUTHASH_C of course) without taking into account that they happen to be equivalent. Additionally, I think that the overhead and comprehension required to maintain such a mapping between independent tasks and reason about what is going on would be much higher than saving a little space in sstate.

Case B) Task B is actually a later version of Task A. In this cases, the two tasks do indeed "converge" because when a later executor goes to build Task B, it will be told by the equivalence server that it is equivalent to OUTHASH_C, and will find the sstate object written by Task A, since it will match exactly, thus preventing the need to rebuild. Keen observers will of course note that my logic from Case A is still applicable here: namely that both Task A and Task B *must* have been built once to determine their equivalence, and therefore both of their "classic" sstate objects (those that use BB_TASKHASH instead of OUTHASH_C) would have been in the sstate cache. And since I stated that the object of equivalence isn't to reduce the size of the sstate cache, why even bother with translating to OUTHASH_C at all? This is all true, and leads into the next question of preventing downstream task rebuilds, which I answer more fully below. As far as why use an sstate object named with OUTHASH_C instead of BB_TASKHASH, in this case (where Task A and Task B are different versions of the same task), the overhead of such a mapping is much lower that the independent case, and it is also much easier to reason about what is going on so it makes more sense to maintain. Additionally, naming the sstate objects by output hash has one additional benefit: Anyone (with permission) can report a new input hash that maps to that output hash and it could be useful to another executor (e.g. they might actually find a sstate object named with the output hash already existing in sstate). If you name the sstate objects with BB_TASKHASH, then the only executors that can generate useful equivalence mappings are those who end up populating the shared sstate caches.

All that being said, I wouldn't necessarily mind reporting the full sstate object path (e.g. ee/sstate:systemd-serialgetty:qemux86copy-oe-linux-musl:1.0:r5:qemux86copy:3:eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee_package.tgz) instead of just the basic output hash. This wouldn't change anything detailed above, and it wouldn't be used to "converge" the tasks as mentioned above, but rather it would give the humans looking at what is going on a much better way to debug the system. I don't particularly feel that output hashes need to be as anonmized as the input hashes, because reporting output hashes is most assuredly an "opt-in" sort of thing. If you can maintain a private server, you can prevent reporting anything to the public server. A better option that even that would be (I think) to actually enable a "report additional statistics" option when you report an equivalent hash. If enabled, all sorts of very useful data could be uploaded to the equivalence server, such as the information that goes into the sstate object path, perhaps the task siginfo, etc. Again, I'm don't necessarily want to derail the discussion with what we *could* do (as I realize this could, and should, be a sensitive topic), but keep it in mind because it will probably come up again. Rest assured, I'm not going to do anything behind anybody's back (or at all if that's the consensus): input and output hashes are all we really need.

Of course, none of this actually prevents downstream task rebuild, as you pointed out below.... 


And I'm still missing how this will prevent rebuilding e.g. chromium against new zlib even when older zlib is sstate equivalent - we will reuse older sstate from older zlib build instead of rebuilding zlib, but what taskhash will OEBasicHash use when computing the new hash of chromium.do_configure? It should use the same one as in previous chromium build (if they are in the same equivalency group), because if it uses any other one, then we'll end with chromium rebuilding even when it's against identical zlib with just different hash.

I should have made this more clear, but it doesn't. That is the end goal. However, you need this piece in place first, and in the interest of breaking up the problem into manageable chunks, this seems to be the direction I've been steered by previous discussions (which I completely agree with).

Once you have the hash equivalence structure in place, I believe that the general idea is to calculate task dependencies based on the output hash in place of the task hash. This means that when the equivalence server reports that zlibs output hash hasn't changed, chromium.do_configure doesn't re-run because it's dependency on zlib is based on zlibs outhash, not its task hash. This will require some core bitbake work and is going to be far more complicated that this (relatively) simple tracking of persistent equivalence, so it seems better to get this part out of the way first.

Of interesting note: Let's go back to Case B... if bitbake did calculate dependencies by output hash, and Task A and Task B did have the same output hash, than some other task that depended on the task provided by Task A and Task B would be able to use either interchangeably.... which is pretty awesome.

Thanks,
Joshua Watt (Champion of the comma splice)


On Fri, Jun 8, 2018 at 5:09 AM Joshua Watt <jpewhacker@...> wrote:
Hello all,

I've been looking into the sstate equivalency mechanism discussed
[previously][1], and decided that it is probably time to report what I
am proposing, so that more discussion can be had.

[1]: http://lists.openembedded.org/pipermail/openembedded-architecture/
2018-May/000745.html


Please make note of the **QUESTION** lines, as they indicate
(known) outstanding issues haven't sorted out yet.

Apologies for the length. You can also see this in all of its markdown
glory
[here](https://gist.github.com/JPEWdev/506b70157cdbb59454e445fe71a57c7e
)

# Hash Equivalency Server

This design proposes a design for a Hash Equivalence Server that can
be use to centralize the records of what task hashes can be considered
equivalent. The server implements a REST HTTP API based on JSON data
to report to bitbake clients what task hash can be considered
equivalent on request, and also allow selected clients to report new
known equivalent hashes.

The hash equivalency server identifies task hashes based on these main
attributes:

1. `taskhash`: *Task Hash* - This is the value `${BB_TASKHASH}` that
   bitbake calculates from the inputs to the hash
2. `method`: *Output Hash Method* - The string method used to
   calculate the output hash. These will be described in detail later.
3. `outhash`: *Output Hash*: The hash derived from the build outputs
   using the `method`

**QUESTION** Do we need the `method`? Presumably, a hash is a hash is
hash is unique, so the method is unnecessary since different hashing
methods will produce different hashes from the same input, but at the
same time we also *expect* different a `outhash` when the `method`
differs.

The server may track and report other additional information about the
task hashes, but the items described above are the minimum required.

Once bitbake has calculated the task hash and determined that it needs
to rerun a task, it can send a HTTP GET request to the Hash
Equivalency Server to ask if knows any equivalent tasks that can be
used in place of rebuilding. This GET request is formulated as:

    GET /v1/outhashes?taskhash=<TASKHASH>&method=<METHOD>

In this request, `<TASKHASH>` and `<METHOD>` are replaced by the
appropriate information about the task for which bitbake is making the
request.

**QUESTION** I think we only need to request equivalent hashes for
things that have to be rebuilt... is there any reason to request them
for all tasks regardless of if they need to be rebuilt or not?

**QUESTION** You could pretty easily get some useful statistics on how
many users are (re)building a given taskhash. I know that's a more
controversial topic that I would mostly like to avoid (for now), but
keep it in mind... In general I think having a hash equivalency server
opens up a lot of those kinds of possibilities.

The response from the GET request will return the HTTP 200 OK code,
and the result body will be a JSON object that describes the
equivalent hash. At a minimum, the object will provide the `outhash`
field, as this is required for bitbake to discover a matching sstate
object. In the event there are no equivalent hashes, an empty object
is returned. The following is an example of the returned JSON:

```json
{
    "outhash": "0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"
}
```

**QUESTION** Is there any reason to return more than one equivalent
hash?

Once an output hash is discovered for a task, the sstate task will use
it in place of the `BB_TASKHASH` variable for the purpose of looking
up sstate objects.

# Writing new sstate objects

In the event that a bitbake executor cannot find an equivalent hash
(either because no output hashes were reported, or no such sstate
objects exist), it will end up performing the build, then needing to
write out a new sstate object. These new sstate objects will always be
written using the output hash in the file name (in place of the
current `BB_TASKHASH`). These output hashes are calculated using one
of the hash calculation methods described below.

**QUESTION** Is there any desire to be able to do a build later
without the hash equivalency server? I don't think it would be
terribly difficult to have the sstate cache create symbolic links to
the equivalent task hashes with the name of the original task hash so
that builders who are unaware of the equivalency server can still find
the sstate objects.

# Reporting New Equivalent Hashes

A bitbake executor can report newly discovered equivalent hashes to
the hash equivalency server also via a REST API. These hashes can be
reported at the time the sstate object is written, using a POST
request:

    POST /v1/taskhashes

The body of the request must contain an array of JSON object that
describes the equivalent hashes. At a minimum, each object must
contain the `taskhash`, `method`, and `outhash` fields. For example:

```json
[
    {
        "taskhash": "c7c51be323b56305e1e7e74f4011e810fd340908",
        "method": "basic",
        "outhash": "0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"
    },
    {
        "taskhash": "e7a3049fa3a27861d4b54d29b2e3b781f0dbf304",
        "method": "basic",
        "outhash": "114aa9bac59642fa951a0fed8359930fc602679d"
    },
]
```

**QUESTION** What do we do if multiple different output hashes are
reported for the same taskhash? This is actually pretty useful
information, since it would indicate a non-reproducible build.

**QUESTION** You can go pretty crazy with reporting here....
Hopefully, the three required fields are anonymous enough that there
are no qualms with reporting them.  Anything beyond that would
obviously be enormously helpful from a data-mining and statistics
perspective. You wouldn't be able to keep that anonymous enough, so it
would *have* to be opt-in. Again, not the focus of this discussion,
but keep it in mind

## Hash reporting security

The use of a public hash equivalency server has the potential
introduce serious security flaws into consumers that choose to use it.
For example, it would be trivial for a malicious user to upload an
outhash for an unpatched version of software as being equivalent to
the patched version's taskhash. As such, it is important that the
identity for the source of all reported taskhash be confirmed before
they are submitted to the database. To confirm the identity of posted
task hashes, users who wish to supply hashes must first upload a
public RSA key to the server which is tied to a username. When the
user wishes to POST a hash, the HTTP body doesn't contain the raw
JSON, but instead contains a JSON Web Signature string (per
[RFC7515](https://tools.ietf.org/html/rfc7515)), signed with the users
private key. In order for the server to identify which user/key should
be used to verify the signature, the message should contain the "kid"
parameter in the unprotected header that is set to the username
corresponding to the signing key. Additionally, the request must also
include a numeric "nonce", field in the protected header. This nonce
value must be numerically larger than the last reported nonce by that
user, or the request will be rejected with an error. In this case,
bitbake will be made aware of the error and the last reported nonce
value so it can retry with a higher value.

The server will store the username associated with all reported hashes
to allow for easy revocation of hashes in the event of a private key
breech, or a request to clear out reported hashes from the user for
privacy reasons.

**QUESTION** Web technology isn't really my thing... is there a better
way?

# Hashing Methods

The hashing equivalency code needs to be provided with a hashing
algorithm to determine if two taskhashes are equivalent. The default
algorithm provided with the implementation will be named "basic" and
calculates a SHA1 using the algorithm found
[here](https://gist.github.com/JPEWdev/d5e8d339d6d33a505fee1fd049994262
)

**QUESTION** It is important to note what that algorithm does hash
(file path, mode, file type, contents), but also what it *doesn't*
hash (namely the owner/group and timestamps). I think this is what we
would like. Any suggestions?


Thanks,
Joshua Watt
_______________________________________________
Openembedded-architecture mailing list
Openembedded-architecture@...
http://lists.openembedded.org/mailman/listinfo/openembedded-architecture


Re: sstate equivalency

Martin Jansa
 

This description sounds like there would be one global equivalency server, but I would expect this to be implemented in similar way how bitbake PRserv is currently used - so that each project or user can define which equivalency server to use and where to report the hashes.

I'm also surprised that there is no indication of the recipe, architecture, version for these hashes, just one hash for everything which is good for anonymity of the submitted hashes, but it also means that if I have very simple component like systemd-serialgetty:
sstate-cache/dd/sstate:systemd-serialgetty:qemux86-oe-linux-musl:1.0:r5:qemux86:3:ddf17a1eff3be1ff6a7ff393bbbf289b_package.tgz
then the outhash can be the same as what qemux86copy MACHINE builds, but how is bitbake supposed to find that instead of building ddf17a1eff3be1ff6a7ff393bbbf289b it can re-use sstate equivalent file:
sstate-cache/ee/sstate:systemd-serialgetty:qemux86copy-oe-linux-musl:1.0:r5:qemux86copy:3:eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee_package.tgz
?

From one point of view this is great that we can re-use already built sstate even if it was built for completely different MACHINE and possibly even different recipe and version as long as the outhash is identical, but it blows up really badly once you get some conflicts in outhash calculations (and it really depends on what we'll include in outhash, first we try to ignore timestamp, then users, then we might be interested only in the ABI for given library).

And I'm still missing how this will prevent rebuilding e.g. chromium against new zlib even when older zlib is sstate equivalent - we will reuse older sstate from older zlib build instead of rebuilding zlib, but what taskhash will OEBasicHash use when computing the new hash of chromium.do_configure? It should use the same one as in previous chromium build (if they are in the same equivalency group), because if it uses any other one, then we'll end with chromium rebuilding even when it's against identical zlib with just different hash.

On Fri, Jun 8, 2018 at 5:09 AM Joshua Watt <jpewhacker@...> wrote:
Hello all,

I've been looking into the sstate equivalency mechanism discussed
[previously][1], and decided that it is probably time to report what I
am proposing, so that more discussion can be had.

[1]: http://lists.openembedded.org/pipermail/openembedded-architecture/
2018-May/000745.html


Please make note of the **QUESTION** lines, as they indicate
(known) outstanding issues haven't sorted out yet.

Apologies for the length. You can also see this in all of its markdown
glory
[here](https://gist.github.com/JPEWdev/506b70157cdbb59454e445fe71a57c7e
)

# Hash Equivalency Server

This design proposes a design for a Hash Equivalence Server that can
be use to centralize the records of what task hashes can be considered
equivalent. The server implements a REST HTTP API based on JSON data
to report to bitbake clients what task hash can be considered
equivalent on request, and also allow selected clients to report new
known equivalent hashes.

The hash equivalency server identifies task hashes based on these main
attributes:

1. `taskhash`: *Task Hash* - This is the value `${BB_TASKHASH}` that
   bitbake calculates from the inputs to the hash
2. `method`: *Output Hash Method* - The string method used to
   calculate the output hash. These will be described in detail later.
3. `outhash`: *Output Hash*: The hash derived from the build outputs
   using the `method`

**QUESTION** Do we need the `method`? Presumably, a hash is a hash is
hash is unique, so the method is unnecessary since different hashing
methods will produce different hashes from the same input, but at the
same time we also *expect* different a `outhash` when the `method`
differs.

The server may track and report other additional information about the
task hashes, but the items described above are the minimum required.

Once bitbake has calculated the task hash and determined that it needs
to rerun a task, it can send a HTTP GET request to the Hash
Equivalency Server to ask if knows any equivalent tasks that can be
used in place of rebuilding. This GET request is formulated as:

    GET /v1/outhashes?taskhash=<TASKHASH>&method=<METHOD>

In this request, `<TASKHASH>` and `<METHOD>` are replaced by the
appropriate information about the task for which bitbake is making the
request.

**QUESTION** I think we only need to request equivalent hashes for
things that have to be rebuilt... is there any reason to request them
for all tasks regardless of if they need to be rebuilt or not?

**QUESTION** You could pretty easily get some useful statistics on how
many users are (re)building a given taskhash. I know that's a more
controversial topic that I would mostly like to avoid (for now), but
keep it in mind... In general I think having a hash equivalency server
opens up a lot of those kinds of possibilities.

The response from the GET request will return the HTTP 200 OK code,
and the result body will be a JSON object that describes the
equivalent hash. At a minimum, the object will provide the `outhash`
field, as this is required for bitbake to discover a matching sstate
object. In the event there are no equivalent hashes, an empty object
is returned. The following is an example of the returned JSON:

```json
{
    "outhash": "0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"
}
```

**QUESTION** Is there any reason to return more than one equivalent
hash?

Once an output hash is discovered for a task, the sstate task will use
it in place of the `BB_TASKHASH` variable for the purpose of looking
up sstate objects.

# Writing new sstate objects

In the event that a bitbake executor cannot find an equivalent hash
(either because no output hashes were reported, or no such sstate
objects exist), it will end up performing the build, then needing to
write out a new sstate object. These new sstate objects will always be
written using the output hash in the file name (in place of the
current `BB_TASKHASH`). These output hashes are calculated using one
of the hash calculation methods described below.

**QUESTION** Is there any desire to be able to do a build later
without the hash equivalency server? I don't think it would be
terribly difficult to have the sstate cache create symbolic links to
the equivalent task hashes with the name of the original task hash so
that builders who are unaware of the equivalency server can still find
the sstate objects.

# Reporting New Equivalent Hashes

A bitbake executor can report newly discovered equivalent hashes to
the hash equivalency server also via a REST API. These hashes can be
reported at the time the sstate object is written, using a POST
request:

    POST /v1/taskhashes

The body of the request must contain an array of JSON object that
describes the equivalent hashes. At a minimum, each object must
contain the `taskhash`, `method`, and `outhash` fields. For example:

```json
[
    {
        "taskhash": "c7c51be323b56305e1e7e74f4011e810fd340908",
        "method": "basic",
        "outhash": "0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"
    },
    {
        "taskhash": "e7a3049fa3a27861d4b54d29b2e3b781f0dbf304",
        "method": "basic",
        "outhash": "114aa9bac59642fa951a0fed8359930fc602679d"
    },
]
```

**QUESTION** What do we do if multiple different output hashes are
reported for the same taskhash? This is actually pretty useful
information, since it would indicate a non-reproducible build.

**QUESTION** You can go pretty crazy with reporting here....
Hopefully, the three required fields are anonymous enough that there
are no qualms with reporting them.  Anything beyond that would
obviously be enormously helpful from a data-mining and statistics
perspective. You wouldn't be able to keep that anonymous enough, so it
would *have* to be opt-in. Again, not the focus of this discussion,
but keep it in mind

## Hash reporting security

The use of a public hash equivalency server has the potential
introduce serious security flaws into consumers that choose to use it.
For example, it would be trivial for a malicious user to upload an
outhash for an unpatched version of software as being equivalent to
the patched version's taskhash. As such, it is important that the
identity for the source of all reported taskhash be confirmed before
they are submitted to the database. To confirm the identity of posted
task hashes, users who wish to supply hashes must first upload a
public RSA key to the server which is tied to a username. When the
user wishes to POST a hash, the HTTP body doesn't contain the raw
JSON, but instead contains a JSON Web Signature string (per
[RFC7515](https://tools.ietf.org/html/rfc7515)), signed with the users
private key. In order for the server to identify which user/key should
be used to verify the signature, the message should contain the "kid"
parameter in the unprotected header that is set to the username
corresponding to the signing key. Additionally, the request must also
include a numeric "nonce", field in the protected header. This nonce
value must be numerically larger than the last reported nonce by that
user, or the request will be rejected with an error. In this case,
bitbake will be made aware of the error and the last reported nonce
value so it can retry with a higher value.

The server will store the username associated with all reported hashes
to allow for easy revocation of hashes in the event of a private key
breech, or a request to clear out reported hashes from the user for
privacy reasons.

**QUESTION** Web technology isn't really my thing... is there a better
way?

# Hashing Methods

The hashing equivalency code needs to be provided with a hashing
algorithm to determine if two taskhashes are equivalent. The default
algorithm provided with the implementation will be named "basic" and
calculates a SHA1 using the algorithm found
[here](https://gist.github.com/JPEWdev/d5e8d339d6d33a505fee1fd049994262
)

**QUESTION** It is important to note what that algorithm does hash
(file path, mode, file type, contents), but also what it *doesn't*
hash (namely the owner/group and timestamps). I think this is what we
would like. Any suggestions?


Thanks,
Joshua Watt
_______________________________________________
Openembedded-architecture mailing list
Openembedded-architecture@...
http://lists.openembedded.org/mailman/listinfo/openembedded-architecture


Re: sstate equivalency

Joshua Watt
 

Hello all,

I've been looking into the sstate equivalency mechanism discussed
[previously][1], and decided that it is probably time to report what I
am proposing, so that more discussion can be had.

[1]: http://lists.openembedded.org/pipermail/openembedded-architecture/
2018-May/000745.html

Please make note of the **QUESTION** lines, as they indicate
(known) outstanding issues haven't sorted out yet.

Apologies for the length. You can also see this in all of its markdown
glory
[here](https://gist.github.com/JPEWdev/506b70157cdbb59454e445fe71a57c7e
)

# Hash Equivalency Server

This design proposes a design for a Hash Equivalence Server that can
be use to centralize the records of what task hashes can be considered
equivalent. The server implements a REST HTTP API based on JSON data
to report to bitbake clients what task hash can be considered
equivalent on request, and also allow selected clients to report new
known equivalent hashes.

The hash equivalency server identifies task hashes based on these main
attributes:

1. `taskhash`: *Task Hash* - This is the value `${BB_TASKHASH}` that
bitbake calculates from the inputs to the hash
2. `method`: *Output Hash Method* - The string method used to
calculate the output hash. These will be described in detail later.
3. `outhash`: *Output Hash*: The hash derived from the build outputs
using the `method`

**QUESTION** Do we need the `method`? Presumably, a hash is a hash is
hash is unique, so the method is unnecessary since different hashing
methods will produce different hashes from the same input, but at the
same time we also *expect* different a `outhash` when the `method`
differs.

The server may track and report other additional information about the
task hashes, but the items described above are the minimum required.

Once bitbake has calculated the task hash and determined that it needs
to rerun a task, it can send a HTTP GET request to the Hash
Equivalency Server to ask if knows any equivalent tasks that can be
used in place of rebuilding. This GET request is formulated as:

GET /v1/outhashes?taskhash=<TASKHASH>&method=<METHOD>

In this request, `<TASKHASH>` and `<METHOD>` are replaced by the
appropriate information about the task for which bitbake is making the
request.

**QUESTION** I think we only need to request equivalent hashes for
things that have to be rebuilt... is there any reason to request them
for all tasks regardless of if they need to be rebuilt or not?

**QUESTION** You could pretty easily get some useful statistics on how
many users are (re)building a given taskhash. I know that's a more
controversial topic that I would mostly like to avoid (for now), but
keep it in mind... In general I think having a hash equivalency server
opens up a lot of those kinds of possibilities.

The response from the GET request will return the HTTP 200 OK code,
and the result body will be a JSON object that describes the
equivalent hash. At a minimum, the object will provide the `outhash`
field, as this is required for bitbake to discover a matching sstate
object. In the event there are no equivalent hashes, an empty object
is returned. The following is an example of the returned JSON:

```json
{
"outhash": "0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"
}
```

**QUESTION** Is there any reason to return more than one equivalent
hash?

Once an output hash is discovered for a task, the sstate task will use
it in place of the `BB_TASKHASH` variable for the purpose of looking
up sstate objects.

# Writing new sstate objects

In the event that a bitbake executor cannot find an equivalent hash
(either because no output hashes were reported, or no such sstate
objects exist), it will end up performing the build, then needing to
write out a new sstate object. These new sstate objects will always be
written using the output hash in the file name (in place of the
current `BB_TASKHASH`). These output hashes are calculated using one
of the hash calculation methods described below.

**QUESTION** Is there any desire to be able to do a build later
without the hash equivalency server? I don't think it would be
terribly difficult to have the sstate cache create symbolic links to
the equivalent task hashes with the name of the original task hash so
that builders who are unaware of the equivalency server can still find
the sstate objects.

# Reporting New Equivalent Hashes

A bitbake executor can report newly discovered equivalent hashes to
the hash equivalency server also via a REST API. These hashes can be
reported at the time the sstate object is written, using a POST
request:

POST /v1/taskhashes

The body of the request must contain an array of JSON object that
describes the equivalent hashes. At a minimum, each object must
contain the `taskhash`, `method`, and `outhash` fields. For example:

```json
[
{
"taskhash": "c7c51be323b56305e1e7e74f4011e810fd340908",
"method": "basic",
"outhash": "0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"
},
{
"taskhash": "e7a3049fa3a27861d4b54d29b2e3b781f0dbf304",
"method": "basic",
"outhash": "114aa9bac59642fa951a0fed8359930fc602679d"
},
]
```

**QUESTION** What do we do if multiple different output hashes are
reported for the same taskhash? This is actually pretty useful
information, since it would indicate a non-reproducible build.

**QUESTION** You can go pretty crazy with reporting here....
Hopefully, the three required fields are anonymous enough that there
are no qualms with reporting them. Anything beyond that would
obviously be enormously helpful from a data-mining and statistics
perspective. You wouldn't be able to keep that anonymous enough, so it
would *have* to be opt-in. Again, not the focus of this discussion,
but keep it in mind

## Hash reporting security

The use of a public hash equivalency server has the potential
introduce serious security flaws into consumers that choose to use it.
For example, it would be trivial for a malicious user to upload an
outhash for an unpatched version of software as being equivalent to
the patched version's taskhash. As such, it is important that the
identity for the source of all reported taskhash be confirmed before
they are submitted to the database. To confirm the identity of posted
task hashes, users who wish to supply hashes must first upload a
public RSA key to the server which is tied to a username. When the
user wishes to POST a hash, the HTTP body doesn't contain the raw
JSON, but instead contains a JSON Web Signature string (per
[RFC7515](https://tools.ietf.org/html/rfc7515)), signed with the users
private key. In order for the server to identify which user/key should
be used to verify the signature, the message should contain the "kid"
parameter in the unprotected header that is set to the username
corresponding to the signing key. Additionally, the request must also
include a numeric "nonce", field in the protected header. This nonce
value must be numerically larger than the last reported nonce by that
user, or the request will be rejected with an error. In this case,
bitbake will be made aware of the error and the last reported nonce
value so it can retry with a higher value.

The server will store the username associated with all reported hashes
to allow for easy revocation of hashes in the event of a private key
breech, or a request to clear out reported hashes from the user for
privacy reasons.

**QUESTION** Web technology isn't really my thing... is there a better
way?

# Hashing Methods

The hashing equivalency code needs to be provided with a hashing
algorithm to determine if two taskhashes are equivalent. The default
algorithm provided with the implementation will be named "basic" and
calculates a SHA1 using the algorithm found
[here](https://gist.github.com/JPEWdev/d5e8d339d6d33a505fee1fd049994262
)

**QUESTION** It is important to note what that algorithm does hash
(file path, mode, file type, contents), but also what it *doesn't*
hash (namely the owner/group and timestamps). I think this is what we
would like. Any suggestions?


Thanks,
Joshua Watt


Re: combining trusted/security layers

Trevor Woerner
 

On Thu, Jun 7, 2018 at 9:06 AM, Jia Zhang <zhang.jia@...> wrote:
You remind me that I also need to modify cryptfs-tpm2 to interface the
new ldr API.

That would be awesome!! I am very interested in your cryptfs-tpm2 project and would like to see it working again.


Re: combining trusted/security layers

Trevor Woerner
 

Hi Randy,

Thanks for relaying, and continuing the conversation.

On Wed 2018-06-06 @ 05:44:41 PM, Randy MacLeod wrote:
Email from Jia:

Categorizing the recipes in meta-security may be the hardest work in the
whole move. I take a quick glance at the list (thanks for Trevor!) and a
big catalog would be penetration test (meta-penetration-test?). We need
more catalogs to cover the remaining tools. Definitely, the naming
scheme for me is a challenge.
Okay. I can create a meta-penetration-test or meta-pentesting layer in my fork
of Jia's meta-secure-core and start putting the relevant recipes of that
category there to see what others think.

meta-secure-core already has a meta-ids (intrusion detection system) layer, so
I can look through meta-security's list to see which ones apply to that
category also.

When done, if the remaining recipes don't fit into any obvious category, I'll
poke this list again (or bring it up in the calls) to get others' feedback.

Regarding meta-tpm1/2, we could consider to cherry pick one among the 3
layers as the baseline and move the trivial parts in recipe from other 2
layers into the baseline. Other conflicting recipes would follow the
same methodology.
I have a WIP branch here with my work updating and bringing the latest TPM2
stuff into meta-secure-core. It's hung up now because, for the git recipes,
the Intel people are dlopen()'ing raw .so files, and I'm trying to get them to
rectify this before their next API-changing release of their latest TSS
libraries:

https://github.com/twoerner/meta-secure-core/tree/contrib/twoerner/tpm2-recipe-updates

Thanks!


Re: combining trusted/security layers (fix typo in Jia's address)

Randy MacLeod
 

Oops.
Fix typo in Jia's address: .co -> .com


On 05/24/2018 05:26 PM, Trevor Woerner wrote:
Hi everyone, and thanks for all the feedback that's been given already.
I think it would be a great idea if we could get the various trusted/security
layers working together on one layer instead of having separate efforts. As
far as I'm aware, there are currently 3 such layers:
...
From what is presented in the spreadsheet, in my opinion, I don't think it'll
be too hard to get everything in one layer. Surprisingly, there isn't a lot of
overlap. Therefore, all the unique bits from each layer can simply be added to
the one chosen layer. The only real overlap is in the tpm stuff, and that
should be easy to update once in the chosen layer.
The easiest way to combine the layers would be to make meta-security another
sub-layer of meta-secure-core. But I think that might be too simplistic.
meta-security includes a hodgepodge of user-space tools and daemons for
doing miscellaneous security things (recipes-security). meta-secure-core tries
to break logical activities into their own layers (i.e. meta-ids for intrusion
detection systems, meta-integrity for integrity measurement architecture
(ima), etc). If it would be possible to categorize all of the recipes in
meta-security's recipes-security directory, then maybe we could start
distributing them into meta-secure-core and/or creating spaces for them?
Thoughts?
Add Jia, who I've been talking with about our discussion on the
YP tech call yesterday. Hopefully he'll get his email situation
fixed and can carry on without me being a relay node.


Email from Jia:

My email client has a filter problem on receiving emails from gmail for
unknown reason (not a proxy issue) so I cannot directly reply him. Could
you do me a favor to copy my reply to there?

-- reply --

I'm pleasure to see this move. And it sounds great to combine all in one
with a unified design model. Meanwhile, it is effective to avoid the
duplication works on maintainability. Additionally, it also gives a
fine-grained degree on the selection of a subset of feature from the big
one.

Categorizing the recipes in meta-security may be the hardest work in the
whole move. I take a quick glance at the list (thanks for Trevor!) and a
big catalog would be penetration test (meta-penetration-test?). We need
more catalogs to cover the remaining tools. Definitely, the naming
scheme for me is a challenge.

Regarding meta-tpm1/2, we could consider to cherry pick one among the 3
layers as the baseline and move the trivial parts in recipe from other 2
layers into the baseline. Other conflicting recipes would follow the
same methodology.

Thanks,
Jia

--
# Randy MacLeod
# Wind River Linux


Re: combining trusted/security layers

Randy MacLeod
 

On 05/24/2018 05:26 PM, Trevor Woerner wrote:
Hi everyone, and thanks for all the feedback that's been given already.
I think it would be a great idea if we could get the various trusted/security
layers working together on one layer instead of having separate efforts. As
far as I'm aware, there are currently 3 such layers:
...
From what is presented in the spreadsheet, in my opinion, I don't think it'll
be too hard to get everything in one layer. Surprisingly, there isn't a lot of
overlap. Therefore, all the unique bits from each layer can simply be added to
the one chosen layer. The only real overlap is in the tpm stuff, and that
should be easy to update once in the chosen layer.
The easiest way to combine the layers would be to make meta-security another
sub-layer of meta-secure-core. But I think that might be too simplistic.
meta-security includes a hodgepodge of user-space tools and daemons for
doing miscellaneous security things (recipes-security). meta-secure-core tries
to break logical activities into their own layers (i.e. meta-ids for intrusion
detection systems, meta-integrity for integrity measurement architecture
(ima), etc). If it would be possible to categorize all of the recipes in
meta-security's recipes-security directory, then maybe we could start
distributing them into meta-secure-core and/or creating spaces for them?
Thoughts?
Add Jia, who I've been talking with about our discussion on the
YP tech call yesterday. Hopefully he'll get his email situation
fixed and can carry on without me being a relay node.


Email from Jia:

My email client has a filter problem on receiving emails from gmail for
unknown reason (not a proxy issue) so I cannot directly reply him. Could
you do me a favor to copy my reply to there?

-- reply --

I'm pleasure to see this move. And it sounds great to combine all in one
with a unified design model. Meanwhile, it is effective to avoid the
duplication works on maintainability. Additionally, it also gives a
fine-grained degree on the selection of a subset of feature from the big
one.

Categorizing the recipes in meta-security may be the hardest work in the
whole move. I take a quick glance at the list (thanks for Trevor!) and a
big catalog would be penetration test (meta-penetration-test?). We need
more catalogs to cover the remaining tools. Definitely, the naming
scheme for me is a challenge.

Regarding meta-tpm1/2, we could consider to cherry pick one among the 3
layers as the baseline and move the trivial parts in recipe from other 2
layers into the baseline. Other conflicting recipes would follow the
same methodology.

Thanks,
Jia

--
# Randy MacLeod
# Wind River Linux


combining trusted/security layers

Trevor Woerner
 

Hi everyone, and thanks for all the feedback that's been given already.

I think it would be a great idea if we could get the various trusted/security
layers working together on one layer instead of having separate efforts. As
far as I'm aware, there are currently 3 such layers:

meta-measured (http://layers.openembedded.org/layerindex/branch/master/layer/meta-measured/)
meta-security (http://layers.openembedded.org/layerindex/branch/master/layer/meta-security/)
meta-secure-core (http://layers.openembedded.org/layerindex/branch/master/layer/meta-secure-core/)

I personally am most familiar with meta-measured, and I'm mostly only
interested in tpm2 on an RPi3B+.

In an effort to try to help gather the data required to jumpstart this
conversation, I've created a simple google doc that lists these three layers,
their recipes, and provides the results of building against 2 MACHINEs:

intel-corei7-64 (from meta-intel)
raspberrypi3 (from meta-raspberrypi)

https://docs.google.com/spreadsheets/d/1AlH0Q0lGC3idwyFLSt7df09sIXkBuv191fVESUA-oQY/edit?usp=sharing

Please have a look. This spreadsheet is very simple, and only looks at
recipes, it does not include any information about various bbappends, nor
kernel configurations, packagegroups, classes, sample images, etc...

meta-measured is a plain, straight-forward layer that contains recipes.

meta-security contains recipes, but also contains 2 sub-layers:
- meta-tpm
- meta-security-compliance

meta-secure-core is a meta-layer, containing no recipes itself, but collecting
together a set of sub-layers:
- meta
- meta-encrypted-storage
- meta-integrity
- meta-efi-secure-boot
- meta-ids
- meta-signing-key
- meta-tpm
- meta-tpm2

From what is presented in the spreadsheet, in my opinion, I don't think it'll
be too hard to get everything in one layer. Surprisingly, there isn't a lot of
overlap. Therefore, all the unique bits from each layer can simply be added to
the one chosen layer. The only real overlap is in the tpm stuff, and that
should be easy to update once in the chosen layer.

The easiest way to combine the layers would be to make meta-security another
sub-layer of meta-secure-core. But I think that might be too simplistic.
meta-security includes a hodgepodge of user-space tools and daemons for
doing miscellaneous security things (recipes-security). meta-secure-core tries
to break logical activities into their own layers (i.e. meta-ids for intrusion
detection systems, meta-integrity for integrity measurement architecture
(ima), etc). If it would be possible to categorize all of the recipes in
meta-security's recipes-security directory, then maybe we could start
distributing them into meta-secure-core and/or creating spaces for them?

Thoughts?

Best regards,
Trevor


Re: Trusted/secure/etc layers

Trevor Woerner
 

On Tue, May 8, 2018 at 5:04 AM, Joshua Lock <joshua.g.lock@...> wrote:


On 01/05/2018 19:57, Trevor Woerner wrote:
I don't think any of them support ESAPI nor FAPI.

tpm2-tss 2.0, currently in RC, comes with ESAPI support.


Excellent, thanks for the update!

881 - 900 of 1661