[erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re: ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD 10.1-STABLE
Kenji Rikitake
kenji@REDACTED
Mon Apr 27 03:29:33 CEST 2015
Mikael:
The patch is applied and tested.
The patch is at:
https://github.com/jj1bdx/otp/commit/71bfef44f99a01a2b3679bebbc41df1716ea00e5
And it's available as
https://github.com/jj1bdx/otp/commits/18.0-FPE-patch
(18.0-rc1 plus the patch).
The following is a brief result of the test.
In each case the error message was repeated 100000 to 150000 times
(depending on the BEAM code) during the execution of test "interval_int" in
rand_SUITE.erl, at
https://github.com/jj1bdx/emprng/blob/38142e3d0c02b979723082e610f3850d1814afe8/test/rand_SUITE.erl#L179
In the build (with CFLAGS = "-O3 -fstack-protector"):
Note: the value of 0x4a28a7 and 0x4a2 in the following abbreviated log are
common in the first three digits. I also observed this on another build with
difference address values. So my guess is that 0x4a2 is a truncated value of
0x4a28a7.
fpe_sig_action: FPE at 0x4a28a7
ERTS_FP_CHECK_INIT at 0x502153: detected unhandled FPE at 0x4a2
in erts/emulator/beam/erl_arith.c
(gdb) info symbol 0x502153
erts_gc_mixed_plus + 547 in section .text
in erts/emulator/beam/erl_bif_guard.c
(gdb) info symbol 0x4a28a7
erts_gc_trunc_1 + 407 in section .text
In another debug build (with CFLAGS = "-g -fstack-protector", without -O3):
Note: see the similarity of values 0x4cc0b5 and 0x4cc.
fpe_sig_action: FPE at 0x4cc0b5
ERTS_FP_CHECK_INIT at 0x571e60: detected unhandled FPE at 0x4cc
in erts/emulator/sys/unix/erl_unix_sys.h
(gdb) info symbol 0x571e60
__ERTS_FP_CHECK_INIT + 64 in section .text
in erts/emulator/beam/erl_bif_guard.c
(gdb) info symbol 0x4cc0b5
gc_double_to_integer + 501 in section .text
I still cannot conclude what is the real reason, but so far this is all
I've got right now.
Kenji Rikitake
++> Mikael Pettersson <mikpelinux@REDACTED> [2015-04-26 12:19:47 +0200]:
> Date: Sun, 26 Apr 2015 12:19:47 +0200
> From: Mikael Pettersson <mikpelinux@REDACTED>
> To: Kenji Rikitake <kenji@REDACTED>
> Cc: Mikael Pettersson <mikpelinux@REDACTED>, erlang-bugs@REDACTED
> Subject: Re: [erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re:
> ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD
> 10.1-STABLE
>
> Kenji Rikitake writes:
> > Mikael:
> >
> > So far I can only reach to the following analysis:
> >
> > I suspect the FPE was raised when
> > ERTS_FP_CHECK_INIT() was called from
> > erts_mixed_plus() in
> > erts/emulator/beam/erl_arith.c.
> >
> > (My experience with gdb especially with multi-threaded code is
> > rather limited.)
> >
> > The command called:
> >
> > erl -pa ebin -pa test -s rand_SUITE test
> >
> > The error message repeated (and beam.smp crashed):
> >
> > ERTS_FP_CHECK_INIT at 0x502ef3: detected unhandled FPE at 0x4a3
> >
> > gdb result with beam.smp.core:
> >
> > GNU gdb 6.1.1 [FreeBSD]
> > Copyright 2004 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and you are
> > welcome to change it and/or distribute copies of it under certain conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB. Type "show warranty" for details.
> > This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
> > Core was generated by `beam.smp'.
> > Program terminated with signal 6, Aborted.
> > Reading symbols from /lib/libutil.so.9...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libutil.so.9
> > Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libm.so.5
> > Reading symbols from /usr/lib/libelf.so.1...(no debugging symbols found)...done.
> > Loaded symbols for /usr/lib/libelf.so.1
> > Reading symbols from /lib/libncurses.so.8...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libncurses.so.8
> > Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libz.so.6
> > Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libthr.so.3
> > Reading symbols from /usr/lib/librt.so.1...(no debugging symbols found)...done.
> > Loaded symbols for /usr/lib/librt.so.1
> > Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libc.so.7
> > Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
> > Loaded symbols for /libexec/ld-elf.so.1
> > #0 0x0000000801b7fb8a in thr_kill () from /lib/libc.so.7
> > [New Thread 80240b000 (LWP 101044/beam.smp)]
> > [New Thread 80240ac00 (LWP 101042/beam.smp)]
> > [New Thread 80240a800 (LWP 101040/beam.smp)]
> > [New Thread 80240a400 (LWP 101036/beam.smp)]
> > [New Thread 80240a000 (LWP 100892/beam.smp)]
> > [New Thread 802409c00 (LWP 100803/beam.smp)]
> > [New Thread 802409800 (LWP 100651/beam.smp)]
> > [New Thread 802409400 (LWP 100522/beam.smp)]
> > [New Thread 802409000 (LWP 100507/beam.smp)]
> > [New Thread 802408c00 (LWP 100459/beam.smp)]
> > [New Thread 802408800 (LWP 100415/beam.smp)]
> > [New Thread 802408400 (LWP 100309/beam.smp)]
> > [New Thread 802408000 (LWP 100268/beam.smp)]
> > [New Thread 802407c00 (LWP 100254/beam.smp)]
> > [New Thread 802407800 (LWP 100253/beam.smp)]
> > [New Thread 802407400 (LWP 100248/beam.smp)]
> > [New Thread 802407000 (LWP 100245/beam.smp)]
> > [New Thread 802406800 (LWP 100238/beam.smp)]
> > [New Thread 802406400 (LWP 100142/beam.smp)]
> > (gdb) bt
> > #0 0x0000000801b7fb8a in thr_kill () from /lib/libc.so.7
> > #1 0x0000000801b7faf6 in raise () from /lib/libc.so.7
> > #2 0x0000000801b7e2e9 in abort () from /lib/libc.so.7
> > #3 0x000000000049e7d7 in erl_exit_vv ()
> > #4 0x000000000049c813 in erl_exit ()
> > #5 0x000000000063c49b in erts_fp_check_init_error ()
> > #6 0x0000000000502ef3 in erts_gc_mixed_plus ()
> > #7 0x00000000004615e7 in process_main ()
> > #8 0x00000000004ece11 in sched_thread_func ()
> > #9 0x000000000069f6ac in thr_wrapper ()
> > #10 0x00000008016196d5 in pthread_create () from /lib/libthr.so.3
> > #11 0x0000000000000000 in ?? ()
> > (gdb) info symbol 0x502ef3
> > erts_gc_mixed_plus + 547 in section .text
> > (gdb) q
>
> Ok, so this is the ERTS_FP_CHECK_INIT() at the start of erts_gc_mixed_plus()
> which detects a pending FPE, which is not allowed at this point.
>
> There are really only three reasons why this might trigger:
> 1. We got an FP exception outside of checked code (between
> ERTS_FP_CHECK_INIT() and ERTS_FP_ERROR()).
> 2. A libc or libm function called matherr() outside of checked code.
> 3. A process' fp_exception field is uninitialized or clobbered.
>
> Please try the attached debugging patch for 18-rc1. It enables
> logging of FP exceptions and matherr(), which should tell us more
> about what's really going on.
>
> I'm still bothered about the suspiciously low PC address (0x4a3)
> reported. Can you check if that actually corresponds to an address
> in beam.smp or one of its dynamically linked libraries?
>
> /Mikael
>
> diff --git a/erts/emulator/sys/unix/sys_float.c b/erts/emulator/sys/unix/sys_float.c
> index 2ffa649..d35bf4b 100644
> --- a/erts/emulator/sys/unix/sys_float.c
> +++ b/erts/emulator/sys/unix/sys_float.c
> @@ -638,7 +638,7 @@ static void fpe_sig_action(int sig, siginfo_t *si, void *puc)
> fpstate->mxcsr = 0x1F80;
> fpstate->sw &= ~0xFF;
> #endif
> -#if 0
> +#if 1
> {
> char buf[64];
> snprintf(buf, sizeof buf, "%s: FPE at %p\r\n", __FUNCTION__, (void*)pc);
> @@ -839,6 +839,12 @@ matherr(struct exception *exc)
> {
> #if !defined(NO_FPE_SIGNALS)
> volatile unsigned long *fpexnp = erts_get_current_fp_exception();
> +#if 1
> + char buf[128];
> + snprintf(buf, sizeof buf, "sys_float.c:matherr() type %d from %s at %p\r\n",
> + exc->type, exc->name, (void*)__builtin_return_address(0));
> + write(2, buf, strlen(buf));
> +#endif
> if (fpexnp != NULL)
> *fpexnp = (unsigned long)__builtin_return_address(0);
> #endif
>
> >
> > ++> Kenji Rikitake <kenji@REDACTED> [2015-04-25 22:19:34 +0900]:
> > > Date: Sat, 25 Apr 2015 22:19:34 +0900
> > > From: Kenji Rikitake <kenji@REDACTED>
> > > To: Mikael Pettersson <mikpelinux@REDACTED>
> > > Cc: erlang-bugs@REDACTED
> > > Subject: [erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re:
> > > ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD
> > > 10.1-STABLE
> > >
> > > Mikael:
> > >
> > > > I strongly suspect a FreeBSD issue wrt FPE:s. Can you rebuild OTP with
> > > > --disable-hipe --enable-fp-exceptions and then repeat your tests?
> > >
> > > Executing emprng tests on 18.0 built with the above options generated
> > > the following error for many times:
> > >
> > > ERTS_FP_CHECK_INIT at 0x502ef3: detected unhandled FPE at 0x4a3
> > >
> > > So this is not a HiPE but highly suspected to be a FreeBSD FPE issue.
> > >
> > > I'll try the further tests later.
> > >
> > > Kenji Rikitake
> > >
> > > ++> Mikael Pettersson <mikpelinux@REDACTED> [2015-04-25 12:57:42 +0200]:
> > > > Kenji Rikitake writes:
> > > > > I've seen a massive numbers of error when running a common test on
> > > > > 18.0-rc1 with HiPE as:
> > > > >
> > > > > ERTS_FP_CHECK_INIT at 0x50e193: detected unhandled FPE at 0x4ad
> > > > >
> > > > > This didn't happen when HiPE is disabled (--disable-hipe).
> > > > >
> > > > > I have traced this in the source that this message is sent from
> > > > > erts_fp_check_init_error() in erts/emulator/sys/unix/sys_float.c,
> > > > > highly presumably from
> > > > > hipe_fclearerror_error() in erts/emulator/hipe/hipe_native_bif.c.
> > > > >
> > > > > The running environment is on FreeBSD amd64 10.1-STABLE #64 r281235,
> > > > > and the kerl compilation options:
> > > > >
> > > > > export CC=clang CXX=clang CFLAGS="-O3 -fstack-protector" LDFLAGS="-fstack-protector" MAKEFLAGS="-j8"
> > > > > KERL_CONFIGURE_OPTIONS="--disable-native-libs --enable-vm-probes --with-dynamic-trace=dtrace --with-ssl=/usr/local --with-javac --enable-hipe --enable-kernel-poll --with-wx-config=/usr/local/bin/wxgtk2u-2.8-config --without-odbc --enable-threads --enable-sctp --enable-smp-support --disable-silent-rules"
> > > > >
> > > > > You can check this out by:
> > > > >
> > > > > git clone https://github.com/jj1bdx/emprng/
> > > > > cd emprng
> > > > > make tests
> > > >
> > > > I'm not able to reproduce any unhandled FPE:s on Linux/x86_64 with 18.0-rc1
> > > > configured with --enable-hipe --enable-fp-exceptions.
> > > >
> > > > I strongly suspect a FreeBSD issue wrt FPE:s. Can you rebuild OTP with
> > > > --disable-hipe --enable-fp-exceptions and then repeat your tests?
> > > >
> > > > It would also be helpful if you attached a debugger to beam.smp, put a
> > > > breakpoint in erts_fp_check_init_error(), and took a backtrace from the
> > > > thread when that breakpoint it hit. (You can also try to map the PC
> > > > value 0x50e193 reported above to the corresponding C function via beam.smp's
> > > > symbol table.)
> > > >
> > > > Finally, I find the 0x4ad address suspiciously low. Is that address range
> > > > even mapped in your beam.smp process? I don't know how to check that on
> > > > FreeBSD, but on Linux I would look in /proc/${pid}/maps.
> > > >
> > > > /Mikael
> > > _______________________________________________
> > > erlang-bugs mailing list
> > > erlang-bugs@REDACTED
> > > http://erlang.org/mailman/listinfo/erlang-bugs
>
> --
More information about the erlang-bugs
mailing list