[erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re: ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD 10.1-STABLE

Kenji Rikitake kenji@REDACTED
Mon Apr 27 03:29:33 CEST 2015


Mikael:

The patch is applied and tested.
The patch is at:
https://github.com/jj1bdx/otp/commit/71bfef44f99a01a2b3679bebbc41df1716ea00e5
And it's available as
https://github.com/jj1bdx/otp/commits/18.0-FPE-patch
(18.0-rc1 plus the patch).

The following is a brief result of the test.

In each case the error message was repeated 100000 to 150000 times
(depending on the BEAM code) during the execution of test "interval_int" in
rand_SUITE.erl, at
https://github.com/jj1bdx/emprng/blob/38142e3d0c02b979723082e610f3850d1814afe8/test/rand_SUITE.erl#L179

In the build (with CFLAGS = "-O3 -fstack-protector"):

Note: the value of 0x4a28a7 and 0x4a2 in the following abbreviated log are
common in the first three digits. I also observed this on another build with
difference address values. So my guess is that 0x4a2 is a truncated value of
0x4a28a7.

fpe_sig_action: FPE at 0x4a28a7
ERTS_FP_CHECK_INIT at 0x502153: detected unhandled FPE at 0x4a2

in erts/emulator/beam/erl_arith.c
(gdb) info symbol 0x502153
erts_gc_mixed_plus + 547 in section .text

in erts/emulator/beam/erl_bif_guard.c
(gdb) info symbol 0x4a28a7
erts_gc_trunc_1 + 407 in section .text

In another debug build (with CFLAGS = "-g -fstack-protector", without -O3):

Note: see the similarity of values 0x4cc0b5 and 0x4cc.

fpe_sig_action: FPE at 0x4cc0b5
ERTS_FP_CHECK_INIT at 0x571e60: detected unhandled FPE at 0x4cc

in erts/emulator/sys/unix/erl_unix_sys.h
(gdb) info symbol 0x571e60
__ERTS_FP_CHECK_INIT + 64 in section .text

in erts/emulator/beam/erl_bif_guard.c
(gdb) info symbol 0x4cc0b5
gc_double_to_integer + 501 in section .text

I still cannot conclude what is the real reason, but so far this is all
I've got right now.

Kenji Rikitake


++> Mikael Pettersson <mikpelinux@REDACTED> [2015-04-26 12:19:47 +0200]:
> Date: Sun, 26 Apr 2015 12:19:47 +0200
> From: Mikael Pettersson <mikpelinux@REDACTED>
> To: Kenji Rikitake <kenji@REDACTED>
> Cc: Mikael Pettersson <mikpelinux@REDACTED>, erlang-bugs@REDACTED
> Subject: Re: [erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re:
>  ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD
>  10.1-STABLE
> 
> Kenji Rikitake writes:
>  > Mikael:
>  > 
>  > So far I can only reach to the following analysis:
>  > 
>  > I suspect the FPE was raised when
>  > ERTS_FP_CHECK_INIT() was called from
>  > erts_mixed_plus() in
>  > erts/emulator/beam/erl_arith.c.
>  > 
>  > (My experience with gdb especially with multi-threaded code is
>  > rather limited.)
>  > 
>  > The command called:
>  > 
>  > erl -pa ebin -pa test -s rand_SUITE test
>  > 
>  > The error message repeated (and beam.smp crashed):
>  > 
>  > ERTS_FP_CHECK_INIT at 0x502ef3: detected unhandled FPE at 0x4a3
>  > 
>  > gdb result with beam.smp.core:
>  > 
>  > GNU gdb 6.1.1 [FreeBSD]
>  > Copyright 2004 Free Software Foundation, Inc.
>  > GDB is free software, covered by the GNU General Public License, and you are
>  > welcome to change it and/or distribute copies of it under certain conditions.
>  > Type "show copying" to see the conditions.
>  > There is absolutely no warranty for GDB.  Type "show warranty" for details.
>  > This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
>  > Core was generated by `beam.smp'.
>  > Program terminated with signal 6, Aborted.
>  > Reading symbols from /lib/libutil.so.9...(no debugging symbols found)...done.
>  > Loaded symbols for /lib/libutil.so.9
>  > Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
>  > Loaded symbols for /lib/libm.so.5
>  > Reading symbols from /usr/lib/libelf.so.1...(no debugging symbols found)...done.
>  > Loaded symbols for /usr/lib/libelf.so.1
>  > Reading symbols from /lib/libncurses.so.8...(no debugging symbols found)...done.
>  > Loaded symbols for /lib/libncurses.so.8
>  > Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
>  > Loaded symbols for /lib/libz.so.6
>  > Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
>  > Loaded symbols for /lib/libthr.so.3
>  > Reading symbols from /usr/lib/librt.so.1...(no debugging symbols found)...done.
>  > Loaded symbols for /usr/lib/librt.so.1
>  > Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
>  > Loaded symbols for /lib/libc.so.7
>  > Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
>  > Loaded symbols for /libexec/ld-elf.so.1
>  > #0  0x0000000801b7fb8a in thr_kill () from /lib/libc.so.7
>  > [New Thread 80240b000 (LWP 101044/beam.smp)]
>  > [New Thread 80240ac00 (LWP 101042/beam.smp)]
>  > [New Thread 80240a800 (LWP 101040/beam.smp)]
>  > [New Thread 80240a400 (LWP 101036/beam.smp)]
>  > [New Thread 80240a000 (LWP 100892/beam.smp)]
>  > [New Thread 802409c00 (LWP 100803/beam.smp)]
>  > [New Thread 802409800 (LWP 100651/beam.smp)]
>  > [New Thread 802409400 (LWP 100522/beam.smp)]
>  > [New Thread 802409000 (LWP 100507/beam.smp)]
>  > [New Thread 802408c00 (LWP 100459/beam.smp)]
>  > [New Thread 802408800 (LWP 100415/beam.smp)]
>  > [New Thread 802408400 (LWP 100309/beam.smp)]
>  > [New Thread 802408000 (LWP 100268/beam.smp)]
>  > [New Thread 802407c00 (LWP 100254/beam.smp)]
>  > [New Thread 802407800 (LWP 100253/beam.smp)]
>  > [New Thread 802407400 (LWP 100248/beam.smp)]
>  > [New Thread 802407000 (LWP 100245/beam.smp)]
>  > [New Thread 802406800 (LWP 100238/beam.smp)]
>  > [New Thread 802406400 (LWP 100142/beam.smp)]
>  > (gdb) bt
>  > #0  0x0000000801b7fb8a in thr_kill () from /lib/libc.so.7
>  > #1  0x0000000801b7faf6 in raise () from /lib/libc.so.7
>  > #2  0x0000000801b7e2e9 in abort () from /lib/libc.so.7
>  > #3  0x000000000049e7d7 in erl_exit_vv ()
>  > #4  0x000000000049c813 in erl_exit ()
>  > #5  0x000000000063c49b in erts_fp_check_init_error ()
>  > #6  0x0000000000502ef3 in erts_gc_mixed_plus ()
>  > #7  0x00000000004615e7 in process_main ()
>  > #8  0x00000000004ece11 in sched_thread_func ()
>  > #9  0x000000000069f6ac in thr_wrapper ()
>  > #10 0x00000008016196d5 in pthread_create () from /lib/libthr.so.3
>  > #11 0x0000000000000000 in ?? ()
>  > (gdb) info symbol 0x502ef3
>  > erts_gc_mixed_plus + 547 in section .text
>  > (gdb) q
> 
> Ok, so this is the ERTS_FP_CHECK_INIT() at the start of erts_gc_mixed_plus()
> which detects a pending FPE, which is not allowed at this point.
> 
> There are really only three reasons why this might trigger:
> 1. We got an FP exception outside of checked code (between
>    ERTS_FP_CHECK_INIT() and ERTS_FP_ERROR()).
> 2. A libc or libm function called matherr() outside of checked code.
> 3. A process' fp_exception field is uninitialized or clobbered.
> 
> Please try the attached debugging patch for 18-rc1.  It enables
> logging of FP exceptions and matherr(), which should tell us more
> about what's really going on.
> 
> I'm still bothered about the suspiciously low PC address (0x4a3)
> reported.  Can you check if that actually corresponds to an address
> in beam.smp or one of its dynamically linked libraries?
> 
> /Mikael
> 

> diff --git a/erts/emulator/sys/unix/sys_float.c b/erts/emulator/sys/unix/sys_float.c
> index 2ffa649..d35bf4b 100644
> --- a/erts/emulator/sys/unix/sys_float.c
> +++ b/erts/emulator/sys/unix/sys_float.c
> @@ -638,7 +638,7 @@ static void fpe_sig_action(int sig, siginfo_t *si, void *puc)
>      fpstate->mxcsr = 0x1F80;
>      fpstate->sw &= ~0xFF;
>  #endif
> -#if 0
> +#if 1
>      {
>  	char buf[64];
>  	snprintf(buf, sizeof buf, "%s: FPE at %p\r\n", __FUNCTION__, (void*)pc);
> @@ -839,6 +839,12 @@ matherr(struct exception *exc)
>  {
>  #if !defined(NO_FPE_SIGNALS)
>      volatile unsigned long *fpexnp = erts_get_current_fp_exception();
> +#if 1
> +    char buf[128];
> +    snprintf(buf, sizeof buf, "sys_float.c:matherr() type %d from %s at %p\r\n",
> +	     exc->type, exc->name, (void*)__builtin_return_address(0));
> +    write(2, buf, strlen(buf));
> +#endif
>      if (fpexnp != NULL)
>  	*fpexnp = (unsigned long)__builtin_return_address(0);
>  #endif

> 
>  > 
>  > ++> Kenji Rikitake <kenji@REDACTED> [2015-04-25 22:19:34 +0900]:
>  > > Date: Sat, 25 Apr 2015 22:19:34 +0900
>  > > From: Kenji Rikitake <kenji@REDACTED>
>  > > To: Mikael Pettersson <mikpelinux@REDACTED>
>  > > Cc: erlang-bugs@REDACTED
>  > > Subject: [erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re:
>  > >  ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD
>  > >  10.1-STABLE
>  > > 
>  > > Mikael:
>  > > 
>  > > > I strongly suspect a FreeBSD issue wrt FPE:s.  Can you rebuild OTP with
>  > > > --disable-hipe --enable-fp-exceptions and then repeat your tests?
>  > > 
>  > > Executing emprng tests on 18.0 built with the above options generated
>  > > the following error for many times:
>  > > 
>  > > ERTS_FP_CHECK_INIT at 0x502ef3: detected unhandled FPE at 0x4a3
>  > > 
>  > > So this is not a HiPE but highly suspected to be a FreeBSD FPE issue.
>  > > 
>  > > I'll try the further tests later.
>  > > 
>  > > Kenji Rikitake
>  > > 
>  > > ++> Mikael Pettersson <mikpelinux@REDACTED> [2015-04-25 12:57:42 +0200]:
>  > > > Kenji Rikitake writes:
>  > > >  > I've seen a massive numbers of error when running a common test on
>  > > >  > 18.0-rc1 with HiPE as:
>  > > >  > 
>  > > >  > ERTS_FP_CHECK_INIT at 0x50e193: detected unhandled FPE at 0x4ad
>  > > >  > 
>  > > >  > This didn't happen when HiPE is disabled (--disable-hipe).
>  > > >  > 
>  > > >  > I have traced this in the source that this message is sent from
>  > > >  > erts_fp_check_init_error() in erts/emulator/sys/unix/sys_float.c,
>  > > >  > highly presumably from
>  > > >  > hipe_fclearerror_error() in erts/emulator/hipe/hipe_native_bif.c.
>  > > >  > 
>  > > >  > The running environment is on FreeBSD amd64 10.1-STABLE #64 r281235,
>  > > >  > and the kerl compilation options:
>  > > >  > 
>  > > >  > export CC=clang CXX=clang CFLAGS="-O3 -fstack-protector" LDFLAGS="-fstack-protector" MAKEFLAGS="-j8"
>  > > >  > KERL_CONFIGURE_OPTIONS="--disable-native-libs --enable-vm-probes --with-dynamic-trace=dtrace --with-ssl=/usr/local --with-javac --enable-hipe --enable-kernel-poll --with-wx-config=/usr/local/bin/wxgtk2u-2.8-config --without-odbc --enable-threads --enable-sctp --enable-smp-support --disable-silent-rules"
>  > > >  > 
>  > > >  > You can check this out by:
>  > > >  > 
>  > > >  > git clone https://github.com/jj1bdx/emprng/
>  > > >  > cd emprng
>  > > >  > make tests
>  > > > 
>  > > > I'm not able to reproduce any unhandled FPE:s on Linux/x86_64 with 18.0-rc1
>  > > > configured with --enable-hipe --enable-fp-exceptions.
>  > > > 
>  > > > I strongly suspect a FreeBSD issue wrt FPE:s.  Can you rebuild OTP with
>  > > > --disable-hipe --enable-fp-exceptions and then repeat your tests?
>  > > > 
>  > > > It would also be helpful if you attached a debugger to beam.smp, put a
>  > > > breakpoint in erts_fp_check_init_error(), and took a backtrace from the
>  > > > thread when that breakpoint it hit.  (You can also try to map the PC
>  > > > value 0x50e193 reported above to the corresponding C function via beam.smp's
>  > > > symbol table.)
>  > > > 
>  > > > Finally, I find the 0x4ad address suspiciously low.  Is that address range
>  > > > even mapped in your beam.smp process?  I don't know how to check that on
>  > > > FreeBSD, but on Linux I would look in /proc/${pid}/maps.
>  > > > 
>  > > > /Mikael
>  > > _______________________________________________
>  > > erlang-bugs mailing list
>  > > erlang-bugs@REDACTED
>  > > http://erlang.org/mailman/listinfo/erlang-bugs
> 
> -- 




More information about the erlang-bugs mailing list