[erlang-bugs] Different handling of floating point underflows between Linux and Solaris-based OSes

Mikael Pettersson mikpelinux@REDACTED
Fri Jan 2 12:42:15 CET 2015


Corey Cossentino writes:
 > OK, finished running the tests on an OmniOS virtual machine. I'm not
 > completely sure how to interpret the results, but it looks like a lot
 > of tests are failing, in both the patched and unpatched version.
 > 
 > The differences I can see between the two runs, based on the
 > index.html file that was generated:
 >  tests.common_test_test - went from 1 failure to 3 with the code change
 >  tests.tools_test - went from 1 failure to 2 with the code change
 > 
 > Is there a file I should send over that would give more information?

I don't remember exactly how the test suite results are logged, but if you could

a) double-check that the failures are consistent (i.e., not Heisenbugs), and
b) list the test cases that fail with the patch, and whatever output they
produced that might indicate why they failed

that'd be good.

My guess is that the matherr() interface still needs to set the "pending exception"
flag for some cases, presumably functions that otherwise return HUGE not HUGE_VAL
and therefore don't get caught by the isfinite() check.

If you need a quick workaround for the exceptions, you can rebuild otp with
--disable-fp-exceptions; that will have the same effect as the patch, but as
your test suite results show, support for your Solaris derivative may be
slightly inadequate.

/Mikael

 > 
 > On Fri, Dec 26, 2014 at 1:07 PM, Mikael Pettersson <mikpelinux@REDACTED> wrote:
 > > Corey Cossentino writes:
 > >  > I sent this yesterday but it doesn't look like it went through, so
 > >  > apologies if anyone gets this twice.
 > >  >
 > >  >
 > >  > Calculating math:pow(2, -1075) returns 0 on Linux, but causes an
 > >  > exception on a Solaris-based system. This was causing some crashes in
 > >  > RabbitMQ when it tries to calculate math:exp with inputs less than
 > >  > -745.133.
 > >  >
 > >  > Using OTP 17.4 on OmniOS r151006.
 > >  >
 > >  > --
 > >  >
 > >  > Erlang/OTP 17 [erts-6.3] [source] [smp:24:24] [async-threads:10]
 > >  > [hipe] [kernel-poll:false]
 > >  >
 > >  > Eshell V6.3  (abort with ^G)
 > >  > 1> math:pow(2, -1074.999).
 > >  > 5.0e-324
 > >  > 2> math:pow(2, -1074) * math:pow(2, -1).
 > >  > 0.0
 > >  > 3> math:pow(2, -1075).
 > >  > ** exception error: an error occurred when evaluating an arithmetic
 > >  > expression
 > >  >      in function  math:pow/2
 > >  >         called as math:pow(2,-1075)
 > >  > 4> math:exp(-745).
 > >  > 5.0e-324
 > >  > 5> math:exp(-746).
 > >  > ** exception error: an error occurred when evaluating an arithmetic
 > >  > expression
 > >  >      in function  math:exp/1
 > >  >         called as math:exp(-746)
 > >
 > > I can reproduce this on Solaris 10 / SPARC.
 > >
 > > I have reviewed the situation with matherr() on Linux/glibc and Solaris 10,
 > > and I believe a reasonable resolution is to remove the #if !NO_FPE_SIGNALS
 > > block in matherr(), so it reduces to a single "return 1;".
 > >
 > > There are problems with checking math routine results for errors in general,
 > > and the matherr() interface in particular.
 > >
 > > 1. The VM relies on !isfinite() to detect if a math routine failed.
 > >    This appears to work on most systems, but there is a potential problem
 > >    in how various systems and libm implementations behave: while most
 > >    return HUGE_VAL (== INFINITY) on overflows, some return HUGE which is
 > >    a large but finite value.  Solaris' cc -Xt does the latter, but gcc on
 > >    Solaris does the former.  On my glibc-based Linux systems, matherr(3)
 > >    lists HUGE as the return value on overflows for some routines, but my
 > >    tests indicate that HUGE_VAL is returned instead, which while good is
 > >    inconsistent with parts of the documentation.
 > >
 > >    It's entirely possible that other libm implementations also return HUGE
 > >    rather than HUGE_VAL on overflows, which thoroughly breaks our !isfinite()
 > >    test.  On Linux there are at least 3 non-glibc libc/libm implementations,
 > >    and who knows what's in all those *BSD variants.
 > >
 > > 2. matherr(), when properly enabled, is called also in situations the VM does
 > >    not consider to be errors, in particular the underflow case you reported.
 > >    When FP exceptions also are enabled, matherr() sets the FP exception flag,
 > >    causing underflows to erroneously trigger errors.
 > >
 > >    However, on systems where plain HUGE is returned for overflows, matherr()
 > >    + FP exceptions may be the only viable way of detecting those errors.
 > >
 > > 3. As you discovered, matherr() isn't enabled by default on Linux.
 > >
 > > As long as we limit ourselves to systems that consistently return HUGE_VAL
 > > on overflows, as Linux/glibc and Solaris w/ gcc do, we don't need matherr()
 > > to detect errors, which is why having it just return 1 should be Ok.
 > >
 > > Can you run the emulator test suite on your Solaris system, first with
 > > vanilla 17.4 and then with the proposed code change, and check that the
 > > test suite results are the same?
 > >
 > > /Mikael

-- 



More information about the erlang-bugs mailing list