GNU bug report logs - #72145
rare Emacs screwups on x86 due to GCC bug 58416

Previous Next

Package: emacs;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Tue, 16 Jul 2024 23:27:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #31 received at 72145 <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <acorallo <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 72145 <at> debbugs.gnu.org
Subject: Re: bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
Date: Thu, 18 Jul 2024 10:19:03 -0400
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> While testing GNU Emacs built on Fedora 40 with gcc (GCC) 14.1.1
> 20240607 (Red Hat 14.1.1-5) with -m32 for x86 and configured
> --with-wide-int, I discovered that Emacs misbehaved in a hard-to-debug
> way due to GCC bug 58416. This bug causes GCC to generate wrong x86
> machine instructions when a C program accesses a union containing a
> 'double'.
>
> The bug I observed is that if you have something like this:
>
>    union u { double d; long long int i; } u;
>
> then GCC sometimes generates x86 instructions that copy u.i by using
> fldl/fstpl instruction pairs to push the 64-bit quantity onto the 387
> floating point stack, and then pop the stack into another memory
> location. Unfortunately the fldl/fstpl trick fails in the unusual case
> when the bit pattern of u.i, when interpreted as a double, is a NaN,
> as that can cause the fldl/fstpl pair to store a different NaN with a
> different bit pattern, which means the destination integer disagrees
> with u.i.
>
> The bug is obscure, since the bug's presence depends on the GCC
> version, on the optimization options used, on the exact source code,
> and on the exact integer value at runtime (the value is typically
> copied correctly even when GCC has generated the incorrect machine
> code, since most long long int values don't alias with NaNs).
>
> In short the bug appears to be rare.
>
> Here are some possible courses of action:
>
> * Do nothing and hope x86 users won't run into this rare bug.
>
> * Have the GCC folks fix the bug. However, given that the bug has been
>   reported for over a decade multiple times without a fix, it seems
>   that fixing it is too difficult and/or too low priority for this
>   aging platform. Also, even if the bug is fixed in future GCC the bug
>   will still be present with people using older GCC.
>
> * Build with Clang or some other compiler instead. We should be
>   encouraging GCC, though.
>
> * Rewrite Emacs to never use 'double' (or 'float' or 'long double')
>   inside a union. This could be painful and hardly seems worthwhile.
>
> * When using GCC to build Emacs on x86, compile with safer options
>   that make the bug impossible. The attached proposed patch does that,
>   by telling GCC not to use the 387 stack. (This patch fixed the Emacs
>   misbehavior in my experimental build.) The downside is that the
>   resulting Emacs executables need SSE2, introduced for the Pentium 4
>   in 2000 <https://en.wikipedia.org/wiki/SSE2>. Nowadays few users
>   need to run Emacs on non-SSE2 x86, so this may be good enough. Also,
>   the proposed patch gives the builder an option to compile Emacs
>   without the safer options, for people who want to build for older
>   Intel-compatible platforms and who don't mind an occasional wrong
>  answer or crash.

Mmmh nice one :)

I asked GCC people if they have a suggestion on how to work around this
bug <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416#c9>.

Thanks

  Andrea




This bug report was last modified 276 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.