[erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re: ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD 10.1-STABLE
Mikael Pettersson
mikpelinux@REDACTED
Mon Apr 27 10:15:47 CEST 2015
Kenji Rikitake writes:
> Mikael:
>
> The patch is applied and tested.
> The patch is at:
> https://github.com/jj1bdx/otp/commit/71bfef44f99a01a2b3679bebbc41df1716ea00e5
> And it's available as
> https://github.com/jj1bdx/otp/commits/18.0-FPE-patch
> (18.0-rc1 plus the patch).
>
> The following is a brief result of the test.
>
> In each case the error message was repeated 100000 to 150000 times
> (depending on the BEAM code) during the execution of test "interval_int" in
> rand_SUITE.erl, at
> https://github.com/jj1bdx/emprng/blob/38142e3d0c02b979723082e610f3850d1814afe8/test/rand_SUITE.erl#L179
>
> In the build (with CFLAGS = "-O3 -fstack-protector"):
>
> Note: the value of 0x4a28a7 and 0x4a2 in the following abbreviated log are
> common in the first three digits. I also observed this on another build with
> difference address values. So my guess is that 0x4a2 is a truncated value of
> 0x4a28a7.
Yes, the buffer in erts_fp_check_init_error() is too small. I'll bump it to 128.
>
> fpe_sig_action: FPE at 0x4a28a7
> ERTS_FP_CHECK_INIT at 0x502153: detected unhandled FPE at 0x4a2
>
> in erts/emulator/beam/erl_arith.c
> (gdb) info symbol 0x502153
> erts_gc_mixed_plus + 547 in section .text
>
> in erts/emulator/beam/erl_bif_guard.c
> (gdb) info symbol 0x4a28a7
> erts_gc_trunc_1 + 407 in section .text
Can you provide a disassembly of erts_gc_trunc_1 from this build?
>
> In another debug build (with CFLAGS = "-g -fstack-protector", without -O3):
>
> Note: see the similarity of values 0x4cc0b5 and 0x4cc.
>
> fpe_sig_action: FPE at 0x4cc0b5
> ERTS_FP_CHECK_INIT at 0x571e60: detected unhandled FPE at 0x4cc
>
> in erts/emulator/sys/unix/erl_unix_sys.h
> (gdb) info symbol 0x571e60
> __ERTS_FP_CHECK_INIT + 64 in section .text
>
> in erts/emulator/beam/erl_bif_guard.c
> (gdb) info symbol 0x4cc0b5
> gc_double_to_integer + 501 in section .text
Can you provide a disassembly of gc_double_to_integer from this build?
Thanks,
/Mikael
>
> I still cannot conclude what is the real reason, but so far this is all
> I've got right now.
>
> Kenji Rikitake
>
>
> ++> Mikael Pettersson <mikpelinux@REDACTED> [2015-04-26 12:19:47 +0200]:
> > Date: Sun, 26 Apr 2015 12:19:47 +0200
> > From: Mikael Pettersson <mikpelinux@REDACTED>
> > To: Kenji Rikitake <kenji@REDACTED>
> > Cc: Mikael Pettersson <mikpelinux@REDACTED>, erlang-bugs@REDACTED
> > Subject: Re: [erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re:
> > ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD
> > 10.1-STABLE
> >
> > Kenji Rikitake writes:
> > > Mikael:
> > >
> > > So far I can only reach to the following analysis:
> > >
> > > I suspect the FPE was raised when
> > > ERTS_FP_CHECK_INIT() was called from
> > > erts_mixed_plus() in
> > > erts/emulator/beam/erl_arith.c.
> > >
> > > (My experience with gdb especially with multi-threaded code is
> > > rather limited.)
> > >
> > > The command called:
> > >
> > > erl -pa ebin -pa test -s rand_SUITE test
> > >
> > > The error message repeated (and beam.smp crashed):
> > >
> > > ERTS_FP_CHECK_INIT at 0x502ef3: detected unhandled FPE at 0x4a3
> > >
> > > gdb result with beam.smp.core:
> > >
> > > GNU gdb 6.1.1 [FreeBSD]
> > > Copyright 2004 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you are
> > > welcome to change it and/or distribute copies of it under certain conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB. Type "show warranty" for details.
> > > This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
> > > Core was generated by `beam.smp'.
> > > Program terminated with signal 6, Aborted.
> > > Reading symbols from /lib/libutil.so.9...(no debugging symbols found)...done.
> > > Loaded symbols for /lib/libutil.so.9
> > > Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
> > > Loaded symbols for /lib/libm.so.5
> > > Reading symbols from /usr/lib/libelf.so.1...(no debugging symbols found)...done.
> > > Loaded symbols for /usr/lib/libelf.so.1
> > > Reading symbols from /lib/libncurses.so.8...(no debugging symbols found)...done.
> > > Loaded symbols for /lib/libncurses.so.8
> > > Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
> > > Loaded symbols for /lib/libz.so.6
> > > Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
> > > Loaded symbols for /lib/libthr.so.3
> > > Reading symbols from /usr/lib/librt.so.1...(no debugging symbols found)...done.
> > > Loaded symbols for /usr/lib/librt.so.1
> > > Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
> > > Loaded symbols for /lib/libc.so.7
> > > Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
> > > Loaded symbols for /libexec/ld-elf.so.1
> > > #0 0x0000000801b7fb8a in thr_kill () from /lib/libc.so.7
> > > [New Thread 80240b000 (LWP 101044/beam.smp)]
> > > [New Thread 80240ac00 (LWP 101042/beam.smp)]
> > > [New Thread 80240a800 (LWP 101040/beam.smp)]
> > > [New Thread 80240a400 (LWP 101036/beam.smp)]
> > > [New Thread 80240a000 (LWP 100892/beam.smp)]
> > > [New Thread 802409c00 (LWP 100803/beam.smp)]
> > > [New Thread 802409800 (LWP 100651/beam.smp)]
> > > [New Thread 802409400 (LWP 100522/beam.smp)]
> > > [New Thread 802409000 (LWP 100507/beam.smp)]
> > > [New Thread 802408c00 (LWP 100459/beam.smp)]
> > > [New Thread 802408800 (LWP 100415/beam.smp)]
> > > [New Thread 802408400 (LWP 100309/beam.smp)]
> > > [New Thread 802408000 (LWP 100268/beam.smp)]
> > > [New Thread 802407c00 (LWP 100254/beam.smp)]
> > > [New Thread 802407800 (LWP 100253/beam.smp)]
> > > [New Thread 802407400 (LWP 100248/beam.smp)]
> > > [New Thread 802407000 (LWP 100245/beam.smp)]
> > > [New Thread 802406800 (LWP 100238/beam.smp)]
> > > [New Thread 802406400 (LWP 100142/beam.smp)]
> > > (gdb) bt
> > > #0 0x0000000801b7fb8a in thr_kill () from /lib/libc.so.7
> > > #1 0x0000000801b7faf6 in raise () from /lib/libc.so.7
> > > #2 0x0000000801b7e2e9 in abort () from /lib/libc.so.7
> > > #3 0x000000000049e7d7 in erl_exit_vv ()
> > > #4 0x000000000049c813 in erl_exit ()
> > > #5 0x000000000063c49b in erts_fp_check_init_error ()
> > > #6 0x0000000000502ef3 in erts_gc_mixed_plus ()
> > > #7 0x00000000004615e7 in process_main ()
> > > #8 0x00000000004ece11 in sched_thread_func ()
> > > #9 0x000000000069f6ac in thr_wrapper ()
> > > #10 0x00000008016196d5 in pthread_create () from /lib/libthr.so.3
> > > #11 0x0000000000000000 in ?? ()
> > > (gdb) info symbol 0x502ef3
> > > erts_gc_mixed_plus + 547 in section .text
> > > (gdb) q
> >
> > Ok, so this is the ERTS_FP_CHECK_INIT() at the start of erts_gc_mixed_plus()
> > which detects a pending FPE, which is not allowed at this point.
> >
> > There are really only three reasons why this might trigger:
> > 1. We got an FP exception outside of checked code (between
> > ERTS_FP_CHECK_INIT() and ERTS_FP_ERROR()).
> > 2. A libc or libm function called matherr() outside of checked code.
> > 3. A process' fp_exception field is uninitialized or clobbered.
> >
> > Please try the attached debugging patch for 18-rc1. It enables
> > logging of FP exceptions and matherr(), which should tell us more
> > about what's really going on.
> >
> > I'm still bothered about the suspiciously low PC address (0x4a3)
> > reported. Can you check if that actually corresponds to an address
> > in beam.smp or one of its dynamically linked libraries?
> >
> > /Mikael
> >
>
> > diff --git a/erts/emulator/sys/unix/sys_float.c b/erts/emulator/sys/unix/sys_float.c
> > index 2ffa649..d35bf4b 100644
> > --- a/erts/emulator/sys/unix/sys_float.c
> > +++ b/erts/emulator/sys/unix/sys_float.c
> > @@ -638,7 +638,7 @@ static void fpe_sig_action(int sig, siginfo_t *si, void *puc)
> > fpstate->mxcsr = 0x1F80;
> > fpstate->sw &= ~0xFF;
> > #endif
> > -#if 0
> > +#if 1
> > {
> > char buf[64];
> > snprintf(buf, sizeof buf, "%s: FPE at %p\r\n", __FUNCTION__, (void*)pc);
> > @@ -839,6 +839,12 @@ matherr(struct exception *exc)
> > {
> > #if !defined(NO_FPE_SIGNALS)
> > volatile unsigned long *fpexnp = erts_get_current_fp_exception();
> > +#if 1
> > + char buf[128];
> > + snprintf(buf, sizeof buf, "sys_float.c:matherr() type %d from %s at %p\r\n",
> > + exc->type, exc->name, (void*)__builtin_return_address(0));
> > + write(2, buf, strlen(buf));
> > +#endif
> > if (fpexnp != NULL)
> > *fpexnp = (unsigned long)__builtin_return_address(0);
> > #endif
>
> >
> > >
> > > ++> Kenji Rikitake <kenji@REDACTED> [2015-04-25 22:19:34 +0900]:
> > > > Date: Sat, 25 Apr 2015 22:19:34 +0900
> > > > From: Kenji Rikitake <kenji@REDACTED>
> > > > To: Mikael Pettersson <mikpelinux@REDACTED>
> > > > Cc: erlang-bugs@REDACTED
> > > > Subject: [erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re:
> > > > ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD
> > > > 10.1-STABLE
> > > >
> > > > Mikael:
> > > >
> > > > > I strongly suspect a FreeBSD issue wrt FPE:s. Can you rebuild OTP with
> > > > > --disable-hipe --enable-fp-exceptions and then repeat your tests?
> > > >
> > > > Executing emprng tests on 18.0 built with the above options generated
> > > > the following error for many times:
> > > >
> > > > ERTS_FP_CHECK_INIT at 0x502ef3: detected unhandled FPE at 0x4a3
> > > >
> > > > So this is not a HiPE but highly suspected to be a FreeBSD FPE issue.
> > > >
> > > > I'll try the further tests later.
> > > >
> > > > Kenji Rikitake
> > > >
> > > > ++> Mikael Pettersson <mikpelinux@REDACTED> [2015-04-25 12:57:42 +0200]:
> > > > > Kenji Rikitake writes:
> > > > > > I've seen a massive numbers of error when running a common test on
> > > > > > 18.0-rc1 with HiPE as:
> > > > > >
> > > > > > ERTS_FP_CHECK_INIT at 0x50e193: detected unhandled FPE at 0x4ad
> > > > > >
> > > > > > This didn't happen when HiPE is disabled (--disable-hipe).
> > > > > >
> > > > > > I have traced this in the source that this message is sent from
> > > > > > erts_fp_check_init_error() in erts/emulator/sys/unix/sys_float.c,
> > > > > > highly presumably from
> > > > > > hipe_fclearerror_error() in erts/emulator/hipe/hipe_native_bif.c.
> > > > > >
> > > > > > The running environment is on FreeBSD amd64 10.1-STABLE #64 r281235,
> > > > > > and the kerl compilation options:
> > > > > >
> > > > > > export CC=clang CXX=clang CFLAGS="-O3 -fstack-protector" LDFLAGS="-fstack-protector" MAKEFLAGS="-j8"
> > > > > > KERL_CONFIGURE_OPTIONS="--disable-native-libs --enable-vm-probes --with-dynamic-trace=dtrace --with-ssl=/usr/local --with-javac --enable-hipe --enable-kernel-poll --with-wx-config=/usr/local/bin/wxgtk2u-2.8-config --without-odbc --enable-threads --enable-sctp --enable-smp-support --disable-silent-rules"
> > > > > >
> > > > > > You can check this out by:
> > > > > >
> > > > > > git clone https://github.com/jj1bdx/emprng/
> > > > > > cd emprng
> > > > > > make tests
> > > > >
> > > > > I'm not able to reproduce any unhandled FPE:s on Linux/x86_64 with 18.0-rc1
> > > > > configured with --enable-hipe --enable-fp-exceptions.
> > > > >
> > > > > I strongly suspect a FreeBSD issue wrt FPE:s. Can you rebuild OTP with
> > > > > --disable-hipe --enable-fp-exceptions and then repeat your tests?
> > > > >
> > > > > It would also be helpful if you attached a debugger to beam.smp, put a
> > > > > breakpoint in erts_fp_check_init_error(), and took a backtrace from the
> > > > > thread when that breakpoint it hit. (You can also try to map the PC
> > > > > value 0x50e193 reported above to the corresponding C function via beam.smp's
> > > > > symbol table.)
> > > > >
> > > > > Finally, I find the 0x4ad address suspiciously low. Is that address range
> > > > > even mapped in your beam.smp process? I don't know how to check that on
> > > > > FreeBSD, but on Linux I would look in /proc/${pid}/maps.
> > > > >
> > > > > /Mikael
> > > > _______________________________________________
> > > > erlang-bugs mailing list
> > > > erlang-bugs@REDACTED
> > > > http://erlang.org/mailman/listinfo/erlang-bugs
> >
> > --
--
More information about the erlang-bugs
mailing list