[erlang-bugs] math:exp() erlang bug on SPARC

Sat Apr 16 21:12:08 CEST 2016

Matt Keenan writes:
 > Hi,
 > 
 > Just wanted to report a SPARC specific issue with Erlang.
 > 
 > See discussions:
 >    https://github.com/rabbitmq/rabbitmq-server/issues/132
 >    https://groups.google.com/forum/#!topic/rabbitmq-users/Gca8vW52gB8
 > 
 > Only happens on SPARC systems does not occur on X86 architecture.
 > e.g.:
 > 
 > On s11u3 x86:
 > 1> math:exp(-1162.102134881488).
 > 0.0
 > 
 > On s11u3 SPARC:
 > 1> math:exp(-1162.102134881488).
 > ** exception error: an error occurred when evaluating an arithmetic 
 > expression in function math:exp/1 called as math:exp(-1162.102134881488)
 > 
 > Both systems are running 64-bit Erlang 17.5 in a 64-bit environment.
 > 
 > Has an issue related to this been posted before ?

After reproducing this on Solaris 10 / SPARC I recalled:
http://erlang.org/pipermail/erlang-bugs/2014-December/004731.html

First, this call to exp(-1162.1...) goes straight to libm's exp(), which
detects an underflow and returns an error on both Solaris (SPARC) and Linux
(x86_64, sparc64, ARMv7) [all glibc].  In this case, exp() returns 0,
sets errno, and sets flag bits testable by fetestexcept().

On Solaris, libm also calls matherr() as per ancient SVID, and ERTS treats
that as an implicit FP exception.

After exp() returns, ERTS checks (1) if the return value is !isfinite()
[but here it's a finite 0], or (2) if the thread's FP exception flag is set.
On Solaris it's set due to the matherr() call, so the call is treated as
an error.  On Linux we see a finite return value, don't check the error
indicators, and go on as if no error occurred.  (That may be correct for
underflow errors, I don't know.)

The older email thread was about a similar issue with math:pow/2.

Our treatment of matherr() is completely wrong.  With FP exceptions disabled,
we derive no information from matherr() calls, so all error checking is
expressed in the code around those libm calls.  For libm calls, checking for
real FP exceptions is needed to reset parts of the FPU controls, but that
isn't needed for the fake FP exceptions derived from matherr() calls.

In short, I think matherr() should be a no-op regardless of NO_FPE_SIGNALS.

The path below should fix your issue on Solaris.  I'll issue a PR later.

/Mikael

diff --git a/erts/emulator/sys/ose/sys_float.c b/erts/emulator/sys/ose/sys_float.c
index 5187579..764c35b 100644
--- a/erts/emulator/sys/ose/sys_float.c
+++ b/erts/emulator/sys/ose/sys_float.c
@@ -836,10 +836,5 @@ sys_chars_to_double(char* buf, double* fp)
 int
 matherr(struct exception *exc)
 {
-#if !defined(NO_FPE_SIGNALS)
-    volatile unsigned long *fpexnp = erts_get_current_fp_exception();
-    if (fpexnp != NULL)
-	*fpexnp = (unsigned long)__builtin_return_address(0);
-#endif
     return 1;
 }
diff --git a/erts/emulator/sys/unix/sys_float.c b/erts/emulator/sys/unix/sys_float.c
index 8fe7e59..beb5cb5 100644
--- a/erts/emulator/sys/unix/sys_float.c
+++ b/erts/emulator/sys/unix/sys_float.c
@@ -838,11 +838,6 @@ sys_chars_to_double(char* buf, double* fp)
 int
 matherr(struct exception *exc)
 {
-#if !defined(NO_FPE_SIGNALS)
-    volatile unsigned long *fpexnp = erts_get_current_fp_exception();
-    if (fpexnp != NULL)
-	*fpexnp = (unsigned long)__builtin_return_address(0);
-#endif
     return 1;
 }
 
diff --git a/erts/emulator/sys/win32/sys_float.c b/erts/emulator/sys/win32/sys_float.c
index 86e822d..14d4fa0 100644
--- a/erts/emulator/sys/win32/sys_float.c
+++ b/erts/emulator/sys/win32/sys_float.c
@@ -139,8 +139,7 @@ sys_double_to_chars_ext(double fp, char *buffer, size_t buffer_size, size_t deci
 int
 matherr(struct _exception *exc)
 {
-    erl_fp_exception = 1;
-    DEBUGF(("FP exception (matherr) (0x%x) (%d)\n", exc->type, erl_fp_exception));
+    DEBUGF(("FP exception (matherr) (0x%x)\n", exc->type));
     return 1;
 }