[erlang-bugs] Mac OS X - trunc for large float causes ERTS_FP_CHECK_INIT at [...]: detected unhandled FPE at [...]

Tue Nov 29 01:44:56 CET 2011

On Wed, May 4, 2011 at 7:03 AM, Bob Ippolito <bob@REDACTED> wrote:

> On Wed, May 4, 2011 at 1:14 AM, Mikael Pettersson <mikpe@REDACTED> wrote:
> > On Tue, 3 May 2011 16:48:20 -0700, Bob Ippolito <bob@REDACTED> wrote:
> >> On Tue, May 3, 2011 at 3:35 PM, Mikael Pettersson <mikpe@REDACTED>
> wrote:
> >> > On Tue, 3 May 2011 07:18:34 -0700, Bob Ippolito <bob@REDACTED>
> wrote:
> >> >> On Tue, May 3, 2011 at 1:04 AM, Mikael Pettersson <mikpe@REDACTED>
> wrote=
> >> :
> >> >> > Bob Ippolito writes:
> >> >> > =3DC2=3DA0> I only see this error on Mac OS X. I have not been
> able to=
> >>  reprod=3D
> >> >> uce in Linux.
> >> >> > =3DC2=3DA0>
> >> >> > =3DC2=3DA0> Here's an example, any number larger than
> 16#7ffffffffffff=
> >> e00 wil=3D
> >> >> l
> >> >> > =3DC2=3DA0> cause this error.
> >> >> > =3DC2=3DA0>
> >> >> > =3DC2=3DA0> Erlang R14B02 (erts-5.8.3) [source] [64-bit] [smp:4:4]
> [rq=
> >> :4]
> >> >> > =3DC2=3DA0> [async-threads:4] [hipe] [kernel-poll:true]
> >> >> > =3DC2=3DA0>
> >> >> > =3DC2=3DA0> Eshell V5.8.3 =3DC2=3DA0(abort with ^G)
> >> >> > =3DC2=3DA0> 1> trunc(16#7ffffffffffffdff * 1.0).
> >> >> > =3DC2=3DA0> 9223372036854774784
> >> >> > =3DC2=3DA0> 2> trunc(16#7ffffffffffffdff * 1.0).
> >> >> > =3DC2=3DA0> 9223372036854774784
> >> >> > =3DC2=3DA0> 3> trunc(16#7ffffffffffffe00 * 1.0).
> >> >> > =3DC2=3DA0> 9223372036854775808
> >> >> > =3DC2=3DA0> 4> trunc(16#7ffffffffffffe00 * 1.0).
> >> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10086210: detected unhandled
> FPE a=
> >> t
> >> >> > =3DC2=3DA0> 0x19223372036854775808
> >> >> > =3DC2=3DA0> 5> trunc(16#7ffffffffffffe00 * 1.0).
> >> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10086210: detected unhandled
> FPE a=
> >> t
> >> >> > =3DC2=3DA0> 0x19223372036854775808
> >> >> > =3DC2=3DA0> 6> io:format("~s~n", [os:cmd("uname -a")]).
> >> >> > =3DC2=3DA0> Darwin ba.local 10.7.0 Darwin Kernel Version 10.7.0:
> Sat J=
> >> an 29
> >> >> > =3DC2=3DA0> 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386
> i386
> >> >> > =3DC2=3DA0>
> >> >> > =3DC2=3DA0> Here's another example:
> >> >> > =3DC2=3DA0>
> >> >> > =3DC2=3DA0> Erlang R14B02 (erts-5.8.3) [source] [64-bit] [smp:4:4]
> [rq=
> >> :4]
> >> >> > =3DC2=3DA0> [async-threads:4] [hipe] [kernel-poll:true]
> >> >> > =3DC2=3DA0>
> >> >> > =3DC2=3DA0> Eshell V5.8.3 =3DC2=3DA0(abort with ^G)
> >> >> > =3DC2=3DA0> 1> <<F/float>> =3D3D <<67,224,0,0,0,0,0,0>>, trunc(F).
> >> >> > =3DC2=3DA0> 9223372036854775808
> >> >> > =3DC2=3DA0> 2> <<F/float>> =3D3D <<67,224,0,0,0,0,0,0>>, trunc(F).
> >> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10083e24: detected unhandled
> FPE a=
> >> t
> >> >> > =3DC2=3DA0> 0x19223372036854775808
> >> >> > =3DC2=3DA0> 3> <<F/float>> =3D3D <<67,224,0,0,0,0,0,0>>, trunc(F).
> >> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10083e24: detected unhandled
> FPE a=
> >> t
> >> >> > =3DC2=3DA0> 0x19223372036854775808
> >> >> >
> >> >> > It means that the code at 0x19223372036854775808 in the
> >> >> > Erlang VM needs to use the proper ERTS_FP_CHECK_<foo> macros.
> >> >> >
> >> >> > Please attach gdb (or whatever debugger Darwin uses) to a running
> >> >> > Erlang VM and disassemble the code at 0x19223372036854775808.
> >> >> > We need to know the name of the enclosing function, and preferably
> >> >> > also the actual instruction sequence that throws the FPE. If gdb
> >> >> > can show the exact original source code line then that's even
> better.
> >> >> >
> >> >> > If this is in libc rather than the Erlang VM itself, then we need
> >> >> > a call trace to identify which code in the VM called out to this
> >> >> > FP-throwing code. =3DC2=3DA0For that you should probably plant a
> break=
> >> point
> >> >> > at 0x19223372036854775808 and then evaluate one of those Erlang
> >> >> > expressions above to trigger the FPE.
> >> >> >
> >> >>
> >> >> Well, it's actually saying 0x1, the result of the trunc is
> >> >> 9223372036854775808 =C2=A0and the message string is truncated to 64
> >> >> characters which is not enough to show it all. Perhaps the buffer
> size
> >> >> in erts_fp_check_init_error should be adjusted.
> >> >
> >> > Something in your terminal or email client ate a \r\n sequence after
> the
> >> > 0x1 from erts_fp_check_init_error() making it appear glued together
> with
> >> > the 9223372036854775808 that the erlang prompt printed.
> >>
> >> Not my terminal or email client, this is a bug in
> >> erts_fp_check_init_error. It only allocates a 64 byte buffer for the
> >> error message. The pointer address and the \r\n are eaten because the
> >> buffer is too small to fit the whole error message. buf[64] is too
> >> small... the format string itself is already 57 chars (including the
> >> NULL).
> >
> > Ah yes. I did see your comment about the short buffer but failed
> > to connect that with the strange message. The buffer needs to be at
> > least (calculating..) 89 bytes, making it 96 bytes should suffice.
> >
> > This means that my comment about 0x1 and the wrong type SIGFPE
> > handler was invalid. (0x1 is used as a fake PC value in that case.)
> >
> >> Maybe you missed it in my previous email, it's not 0x1, it is
> >> 0x10025433. I showed that by breaking at the function that prints the
> >> error.
> >> Breakpoint 1, erts_fp_check_init_error (fpexnp=3D0x110f2528) at
> >> sys/unix/sys_float.c:87
> >> 87      {
> >> (gdb) p (void*)*fpexnp
> >> $1 =3D (void *) 0x10025433
> >
> > In your previous disassembly that pointed to a cvttsd2siq instruction.
> > That can probably throw a SIGFPE, but I see similar code in a build on
> > Linux, and there SIGFPE isn't thrown.
> >
> > If you attach gdb to a freshly started beam instance, let the process
> > continue, and evaluate one of those expressions at the Erlang prompt,
> > then gdb should wake up with a SIGFPE at that location.  At that point
> > dump parts of the SSE2 state with:
> >
> > print $mxcsr (SSE control and status flags)
> > print $xmm1 (the source operand in the failing SSE instruction)
> >
> > (If the first SIGFPE occurs elsewhere, disassemble that code first, then
> > adjust the print $xmm1 to match that instruction's source operand.)
>
> Program received signal EXC_ARITHMETIC, Arithmetic exception.
> [Switching to process 14985]
> double_to_integer [inlined] () at
> /Users/bob/src/otp_src_R14B02/erts/emulator/beam/erl_bif_guard.c:301
> 301             d = x;            /* trunc */
> (gdb) info frame
> Stack level 0, frame at 0x10:
>  rip = 0x10025433 in trunc_1 (beam/erl_bif_guard.c:301); saved rip
> 0x10025433
>  called by frame at 0x0
>  source language c.
>  Arglist at unknown address.
>  Locals at unknown address, Previous frame's sp in rsp
> (gdb) disassemble 0x0000000010025433
> [...]
> 0x0000000010025433 <trunc_1+371>:       cvttsd2siq %xmm1,%rdx
> [...]
> (gdb) print $mxcsr
> $1 = 6433
> (gdb) print $xmm1
> $2 = {
>  v4_float = {0, 0, 448, 0},
>  v2_double = {0, 9.2233720368547758e+18},
>  v16_int8 = {0, 0, 0, 0, 0, 0, 0, 0, 67, -32, 0, 0, 0, 0, 0, 0},
>  v8_int16 = {0, 0, 0, 0, 17376, 0, 0, 0},
>  v4_int32 = {0, 0, 1138753536, 0},
>  v2_int64 = {0, 4890909195324358656},
>  uint128 = 57411
> }
> (gdb) print $rdx
> $3 = 16

For the archives - just saw this again on R14B04, Mac OS X.

It appears to be fixed in R15A (git 1c99516 from a couple weeks ago) though.

-bob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20111128/1c2e060e/attachment.htm>