Y2038 proposal


Alexander Kanavin
 

On Tue, 29 Nov 2022 at 16:45, Stephen Jolley <sjolley.yp.pm@...> wrote:
We’d welcome a proposal/series on how to move forward with the Y2038 work for 32 bit platforms.
I have the following proposal:

1. A branch is made where:
a. "-D_TIME_BITS=64 -D_FILE_OFFSET_BITS=64" is enabled globally.
b. qemu is always started with "-rtc base=2040-01-01", simulating
Y2038 actually occurring.
c. an additional runtime test verifies that both RTC clock and system
clock report 2040.

2. This branch is run through a-full on the autobuilder. Any uncovered
issues are filed as bugs.

3. Once *all* of the bugs are addressed, repeat point 2.

4. Once there are no more open bugs, 1a is merged into master.

Any fatal flaws in the plan?

It's not hard to see that Y2038 problem is real and serious, e.g. on
qemux86 core-image-full-cmdline built from master:

root@qemux86:~# ls /
bin boot dev etc home lib lost+found media mnt proc
run sbin sys tmp usr var
root@qemux86:~# date -s "2040-01-01"
Sun Jan 1 00:00:00 UTC 2040
root@qemux86:~# ls /
bin boot dev etc home lib lost+found media mnt proc
run sbin sys tmp usr var
root@qemux86:~# ls /
-sh: ls: command not found

On qemux86_64 the same sequence works as expected, of course.

Alex


Matt Johnston
 

On Wed, 2022-11-30 at 09:07 +0100, Alexander Kanavin wrote:
On Tue, 29 Nov 2022 at 16:45, Stephen Jolley
<sjolley.yp.pm@...> wrote:
We’d welcome a proposal/series on how to move forward with
the Y2038 work for 32 bit platforms.
I have the following proposal:

1. A branch is made where:
a. "-D_TIME_BITS=64 -D_FILE_OFFSET_BITS=64" is enabled globally.
b. qemu is always started with "-rtc base=2040-01-01", simulating
Y2038 actually occurring.
c. an additional runtime test verifies that both RTC clock and system
clock report 2040.

2. This branch is run through a-full on the autobuilder. Any
uncovered
issues are filed as bugs.
Your email prompted me to check my own software (Dropbear) and it showed a
few y2038 issues to fix. Those bugs wouldn't be noticed from a quick test -
it "only" prevented auth and idle timeouts from occurring.

gcc and clang are able to flag truncated conversions for 64-bit time_t with 
-Wconversion, but that's very noisy. Comparing that against a 32-bit time_t
build, however, gives a pretty clean list of code that needs attention.

As an experiment I've built OpenBMC with and without 64-bit time_t,
https://github.com/mkj/yocto-y2038 has the results and a description. There
are a mix of false positives (particularly tv_usec/tv_nsec), but also some
real-looking things. As an example, busybox using a uint32_t to copy a dhcpd
lease expiry.

I'm not sure the best way to use these logs - they need manual review.
Expanding the list of packages should be easy, but there will be more that
need manual intervention to get rid of -Werror.

Cheers,
Matt


Richard Purdie
 

On Fri, 2022-12-02 at 16:54 +0800, Matt Johnston wrote:
Your email prompted me to check my own software (Dropbear) and it showed a
few y2038 issues to fix. Those bugs wouldn't be noticed from a quick test -
it "only" prevented auth and idle timeouts from occurring.

gcc and clang are able to flag truncated conversions for 64-bit time_t with 
-Wconversion, but that's very noisy. Comparing that against a 32-bit time_t
build, however, gives a pretty clean list of code that needs attention.

As an experiment I've built OpenBMC with and without 64-bit time_t,
https://github.com/mkj/yocto-y2038 has the results and a description. There
are a mix of false positives (particularly tv_usec/tv_nsec), but also some
real-looking things. As an example, busybox using a uint32_t to copy a dhcpd
lease expiry.

I'm not sure the best way to use these logs - they need manual review.
Expanding the list of packages should be easy, but there will be more that
need manual intervention to get rid of -Werror.
That is really interesting data as it confirmed there are real world
issues which changing the compiler flags is going to break. Thanks for
sharing.

What you describe is relatively easy for a maintainer to do as a one
off check but not really something we can do at scale for all the
software we build. It worries me :/. I guess the one upside is that
whilst it did break some functionality, it didn't actually crash the
runtime if I understand what happened correctly.

I'm not sure this should stop our plan to switch the flags but it is
certainly something to think about and be aware of.

Cheers,

Richard