[erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re: ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD 10.1-STABLE

Mikael Pettersson mikpelinux@REDACTED
Mon Apr 27 10:15:47 CEST 2015


Kenji Rikitake writes:
 > Mikael:
 > 
 > The patch is applied and tested.
 > The patch is at:
 > https://github.com/jj1bdx/otp/commit/71bfef44f99a01a2b3679bebbc41df1716ea00e5
 > And it's available as
 > https://github.com/jj1bdx/otp/commits/18.0-FPE-patch
 > (18.0-rc1 plus the patch).
 > 
 > The following is a brief result of the test.
 > 
 > In each case the error message was repeated 100000 to 150000 times
 > (depending on the BEAM code) during the execution of test "interval_int" in
 > rand_SUITE.erl, at
 > https://github.com/jj1bdx/emprng/blob/38142e3d0c02b979723082e610f3850d1814afe8/test/rand_SUITE.erl#L179
 > 
 > In the build (with CFLAGS = "-O3 -fstack-protector"):
 > 
 > Note: the value of 0x4a28a7 and 0x4a2 in the following abbreviated log are
 > common in the first three digits. I also observed this on another build with
 > difference address values. So my guess is that 0x4a2 is a truncated value of
 > 0x4a28a7.

Yes, the buffer in erts_fp_check_init_error() is too small.  I'll bump it to 128.

 >
 > fpe_sig_action: FPE at 0x4a28a7
 > ERTS_FP_CHECK_INIT at 0x502153: detected unhandled FPE at 0x4a2
 > 
 > in erts/emulator/beam/erl_arith.c
 > (gdb) info symbol 0x502153
 > erts_gc_mixed_plus + 547 in section .text
 > 
 > in erts/emulator/beam/erl_bif_guard.c
 > (gdb) info symbol 0x4a28a7
 > erts_gc_trunc_1 + 407 in section .text

Can you provide a disassembly of erts_gc_trunc_1 from this build?

 > 
 > In another debug build (with CFLAGS = "-g -fstack-protector", without -O3):
 > 
 > Note: see the similarity of values 0x4cc0b5 and 0x4cc.
 > 
 > fpe_sig_action: FPE at 0x4cc0b5
 > ERTS_FP_CHECK_INIT at 0x571e60: detected unhandled FPE at 0x4cc
 > 
 > in erts/emulator/sys/unix/erl_unix_sys.h
 > (gdb) info symbol 0x571e60
 > __ERTS_FP_CHECK_INIT + 64 in section .text
 > 
 > in erts/emulator/beam/erl_bif_guard.c
 > (gdb) info symbol 0x4cc0b5
 > gc_double_to_integer + 501 in section .text

Can you provide a disassembly of gc_double_to_integer from this build?

Thanks,

/Mikael

 > 
 > I still cannot conclude what is the real reason, but so far this is all
 > I've got right now.
 > 
 > Kenji Rikitake
 > 
 > 
 > ++> Mikael Pettersson <mikpelinux@REDACTED> [2015-04-26 12:19:47 +0200]:
 > > Date: Sun, 26 Apr 2015 12:19:47 +0200
 > > From: Mikael Pettersson <mikpelinux@REDACTED>
 > > To: Kenji Rikitake <kenji@REDACTED>
 > > Cc: Mikael Pettersson <mikpelinux@REDACTED>, erlang-bugs@REDACTED
 > > Subject: Re: [erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re:
 > >  ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD
 > >  10.1-STABLE
 > > 
 > > Kenji Rikitake writes:
 > >  > Mikael:
 > >  > 
 > >  > So far I can only reach to the following analysis:
 > >  > 
 > >  > I suspect the FPE was raised when
 > >  > ERTS_FP_CHECK_INIT() was called from
 > >  > erts_mixed_plus() in
 > >  > erts/emulator/beam/erl_arith.c.
 > >  > 
 > >  > (My experience with gdb especially with multi-threaded code is
 > >  > rather limited.)
 > >  > 
 > >  > The command called:
 > >  > 
 > >  > erl -pa ebin -pa test -s rand_SUITE test
 > >  > 
 > >  > The error message repeated (and beam.smp crashed):
 > >  > 
 > >  > ERTS_FP_CHECK_INIT at 0x502ef3: detected unhandled FPE at 0x4a3
 > >  > 
 > >  > gdb result with beam.smp.core:
 > >  > 
 > >  > GNU gdb 6.1.1 [FreeBSD]
 > >  > Copyright 2004 Free Software Foundation, Inc.
 > >  > GDB is free software, covered by the GNU General Public License, and you are
 > >  > welcome to change it and/or distribute copies of it under certain conditions.
 > >  > Type "show copying" to see the conditions.
 > >  > There is absolutely no warranty for GDB.  Type "show warranty" for details.
 > >  > This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
 > >  > Core was generated by `beam.smp'.
 > >  > Program terminated with signal 6, Aborted.
 > >  > Reading symbols from /lib/libutil.so.9...(no debugging symbols found)...done.
 > >  > Loaded symbols for /lib/libutil.so.9
 > >  > Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
 > >  > Loaded symbols for /lib/libm.so.5
 > >  > Reading symbols from /usr/lib/libelf.so.1...(no debugging symbols found)...done.
 > >  > Loaded symbols for /usr/lib/libelf.so.1
 > >  > Reading symbols from /lib/libncurses.so.8...(no debugging symbols found)...done.
 > >  > Loaded symbols for /lib/libncurses.so.8
 > >  > Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
 > >  > Loaded symbols for /lib/libz.so.6
 > >  > Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
 > >  > Loaded symbols for /lib/libthr.so.3
 > >  > Reading symbols from /usr/lib/librt.so.1...(no debugging symbols found)...done.
 > >  > Loaded symbols for /usr/lib/librt.so.1
 > >  > Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
 > >  > Loaded symbols for /lib/libc.so.7
 > >  > Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
 > >  > Loaded symbols for /libexec/ld-elf.so.1
 > >  > #0  0x0000000801b7fb8a in thr_kill () from /lib/libc.so.7
 > >  > [New Thread 80240b000 (LWP 101044/beam.smp)]
 > >  > [New Thread 80240ac00 (LWP 101042/beam.smp)]
 > >  > [New Thread 80240a800 (LWP 101040/beam.smp)]
 > >  > [New Thread 80240a400 (LWP 101036/beam.smp)]
 > >  > [New Thread 80240a000 (LWP 100892/beam.smp)]
 > >  > [New Thread 802409c00 (LWP 100803/beam.smp)]
 > >  > [New Thread 802409800 (LWP 100651/beam.smp)]
 > >  > [New Thread 802409400 (LWP 100522/beam.smp)]
 > >  > [New Thread 802409000 (LWP 100507/beam.smp)]
 > >  > [New Thread 802408c00 (LWP 100459/beam.smp)]
 > >  > [New Thread 802408800 (LWP 100415/beam.smp)]
 > >  > [New Thread 802408400 (LWP 100309/beam.smp)]
 > >  > [New Thread 802408000 (LWP 100268/beam.smp)]
 > >  > [New Thread 802407c00 (LWP 100254/beam.smp)]
 > >  > [New Thread 802407800 (LWP 100253/beam.smp)]
 > >  > [New Thread 802407400 (LWP 100248/beam.smp)]
 > >  > [New Thread 802407000 (LWP 100245/beam.smp)]
 > >  > [New Thread 802406800 (LWP 100238/beam.smp)]
 > >  > [New Thread 802406400 (LWP 100142/beam.smp)]
 > >  > (gdb) bt
 > >  > #0  0x0000000801b7fb8a in thr_kill () from /lib/libc.so.7
 > >  > #1  0x0000000801b7faf6 in raise () from /lib/libc.so.7
 > >  > #2  0x0000000801b7e2e9 in abort () from /lib/libc.so.7
 > >  > #3  0x000000000049e7d7 in erl_exit_vv ()
 > >  > #4  0x000000000049c813 in erl_exit ()
 > >  > #5  0x000000000063c49b in erts_fp_check_init_error ()
 > >  > #6  0x0000000000502ef3 in erts_gc_mixed_plus ()
 > >  > #7  0x00000000004615e7 in process_main ()
 > >  > #8  0x00000000004ece11 in sched_thread_func ()
 > >  > #9  0x000000000069f6ac in thr_wrapper ()
 > >  > #10 0x00000008016196d5 in pthread_create () from /lib/libthr.so.3
 > >  > #11 0x0000000000000000 in ?? ()
 > >  > (gdb) info symbol 0x502ef3
 > >  > erts_gc_mixed_plus + 547 in section .text
 > >  > (gdb) q
 > > 
 > > Ok, so this is the ERTS_FP_CHECK_INIT() at the start of erts_gc_mixed_plus()
 > > which detects a pending FPE, which is not allowed at this point.
 > > 
 > > There are really only three reasons why this might trigger:
 > > 1. We got an FP exception outside of checked code (between
 > >    ERTS_FP_CHECK_INIT() and ERTS_FP_ERROR()).
 > > 2. A libc or libm function called matherr() outside of checked code.
 > > 3. A process' fp_exception field is uninitialized or clobbered.
 > > 
 > > Please try the attached debugging patch for 18-rc1.  It enables
 > > logging of FP exceptions and matherr(), which should tell us more
 > > about what's really going on.
 > > 
 > > I'm still bothered about the suspiciously low PC address (0x4a3)
 > > reported.  Can you check if that actually corresponds to an address
 > > in beam.smp or one of its dynamically linked libraries?
 > > 
 > > /Mikael
 > > 
 > 
 > > diff --git a/erts/emulator/sys/unix/sys_float.c b/erts/emulator/sys/unix/sys_float.c
 > > index 2ffa649..d35bf4b 100644
 > > --- a/erts/emulator/sys/unix/sys_float.c
 > > +++ b/erts/emulator/sys/unix/sys_float.c
 > > @@ -638,7 +638,7 @@ static void fpe_sig_action(int sig, siginfo_t *si, void *puc)
 > >      fpstate->mxcsr = 0x1F80;
 > >      fpstate->sw &= ~0xFF;
 > >  #endif
 > > -#if 0
 > > +#if 1
 > >      {
 > >  	char buf[64];
 > >  	snprintf(buf, sizeof buf, "%s: FPE at %p\r\n", __FUNCTION__, (void*)pc);
 > > @@ -839,6 +839,12 @@ matherr(struct exception *exc)
 > >  {
 > >  #if !defined(NO_FPE_SIGNALS)
 > >      volatile unsigned long *fpexnp = erts_get_current_fp_exception();
 > > +#if 1
 > > +    char buf[128];
 > > +    snprintf(buf, sizeof buf, "sys_float.c:matherr() type %d from %s at %p\r\n",
 > > +	     exc->type, exc->name, (void*)__builtin_return_address(0));
 > > +    write(2, buf, strlen(buf));
 > > +#endif
 > >      if (fpexnp != NULL)
 > >  	*fpexnp = (unsigned long)__builtin_return_address(0);
 > >  #endif
 > 
 > > 
 > >  > 
 > >  > ++> Kenji Rikitake <kenji@REDACTED> [2015-04-25 22:19:34 +0900]:
 > >  > > Date: Sat, 25 Apr 2015 22:19:34 +0900
 > >  > > From: Kenji Rikitake <kenji@REDACTED>
 > >  > > To: Mikael Pettersson <mikpelinux@REDACTED>
 > >  > > Cc: erlang-bugs@REDACTED
 > >  > > Subject: [erlang-bugs] FreeBSD FPE issue on ERTS_FP_CHECK_INIT Re:
 > >  > >  ERTS_FP_CHECK_INIT error of HiPE in 18.0-rc1 running on FreeBSD
 > >  > >  10.1-STABLE
 > >  > > 
 > >  > > Mikael:
 > >  > > 
 > >  > > > I strongly suspect a FreeBSD issue wrt FPE:s.  Can you rebuild OTP with
 > >  > > > --disable-hipe --enable-fp-exceptions and then repeat your tests?
 > >  > > 
 > >  > > Executing emprng tests on 18.0 built with the above options generated
 > >  > > the following error for many times:
 > >  > > 
 > >  > > ERTS_FP_CHECK_INIT at 0x502ef3: detected unhandled FPE at 0x4a3
 > >  > > 
 > >  > > So this is not a HiPE but highly suspected to be a FreeBSD FPE issue.
 > >  > > 
 > >  > > I'll try the further tests later.
 > >  > > 
 > >  > > Kenji Rikitake
 > >  > > 
 > >  > > ++> Mikael Pettersson <mikpelinux@REDACTED> [2015-04-25 12:57:42 +0200]:
 > >  > > > Kenji Rikitake writes:
 > >  > > >  > I've seen a massive numbers of error when running a common test on
 > >  > > >  > 18.0-rc1 with HiPE as:
 > >  > > >  > 
 > >  > > >  > ERTS_FP_CHECK_INIT at 0x50e193: detected unhandled FPE at 0x4ad
 > >  > > >  > 
 > >  > > >  > This didn't happen when HiPE is disabled (--disable-hipe).
 > >  > > >  > 
 > >  > > >  > I have traced this in the source that this message is sent from
 > >  > > >  > erts_fp_check_init_error() in erts/emulator/sys/unix/sys_float.c,
 > >  > > >  > highly presumably from
 > >  > > >  > hipe_fclearerror_error() in erts/emulator/hipe/hipe_native_bif.c.
 > >  > > >  > 
 > >  > > >  > The running environment is on FreeBSD amd64 10.1-STABLE #64 r281235,
 > >  > > >  > and the kerl compilation options:
 > >  > > >  > 
 > >  > > >  > export CC=clang CXX=clang CFLAGS="-O3 -fstack-protector" LDFLAGS="-fstack-protector" MAKEFLAGS="-j8"
 > >  > > >  > KERL_CONFIGURE_OPTIONS="--disable-native-libs --enable-vm-probes --with-dynamic-trace=dtrace --with-ssl=/usr/local --with-javac --enable-hipe --enable-kernel-poll --with-wx-config=/usr/local/bin/wxgtk2u-2.8-config --without-odbc --enable-threads --enable-sctp --enable-smp-support --disable-silent-rules"
 > >  > > >  > 
 > >  > > >  > You can check this out by:
 > >  > > >  > 
 > >  > > >  > git clone https://github.com/jj1bdx/emprng/
 > >  > > >  > cd emprng
 > >  > > >  > make tests
 > >  > > > 
 > >  > > > I'm not able to reproduce any unhandled FPE:s on Linux/x86_64 with 18.0-rc1
 > >  > > > configured with --enable-hipe --enable-fp-exceptions.
 > >  > > > 
 > >  > > > I strongly suspect a FreeBSD issue wrt FPE:s.  Can you rebuild OTP with
 > >  > > > --disable-hipe --enable-fp-exceptions and then repeat your tests?
 > >  > > > 
 > >  > > > It would also be helpful if you attached a debugger to beam.smp, put a
 > >  > > > breakpoint in erts_fp_check_init_error(), and took a backtrace from the
 > >  > > > thread when that breakpoint it hit.  (You can also try to map the PC
 > >  > > > value 0x50e193 reported above to the corresponding C function via beam.smp's
 > >  > > > symbol table.)
 > >  > > > 
 > >  > > > Finally, I find the 0x4ad address suspiciously low.  Is that address range
 > >  > > > even mapped in your beam.smp process?  I don't know how to check that on
 > >  > > > FreeBSD, but on Linux I would look in /proc/${pid}/maps.
 > >  > > > 
 > >  > > > /Mikael
 > >  > > _______________________________________________
 > >  > > erlang-bugs mailing list
 > >  > > erlang-bugs@REDACTED
 > >  > > http://erlang.org/mailman/listinfo/erlang-bugs
 > > 
 > > -- 

-- 



More information about the erlang-bugs mailing list