[erlang-bugs] Mac OS X - trunc for large float causes ERTS_FP_CHECK_INIT at [...]: detected unhandled FPE at [...]

Bob Ippolito bob@REDACTED
Wed May 4 16:03:59 CEST 2011


On Wed, May 4, 2011 at 1:14 AM, Mikael Pettersson <mikpe@REDACTED> wrote:
> On Tue, 3 May 2011 16:48:20 -0700, Bob Ippolito <bob@REDACTED> wrote:
>> On Tue, May 3, 2011 at 3:35 PM, Mikael Pettersson <mikpe@REDACTED> wrote:
>> > On Tue, 3 May 2011 07:18:34 -0700, Bob Ippolito <bob@REDACTED> wrote:
>> >> On Tue, May 3, 2011 at 1:04 AM, Mikael Pettersson <mikpe@REDACTED> wrote=
>> :
>> >> > Bob Ippolito writes:
>> >> > =3DC2=3DA0> I only see this error on Mac OS X. I have not been able to=
>>  reprod=3D
>> >> uce in Linux.
>> >> > =3DC2=3DA0>
>> >> > =3DC2=3DA0> Here's an example, any number larger than 16#7ffffffffffff=
>> e00 wil=3D
>> >> l
>> >> > =3DC2=3DA0> cause this error.
>> >> > =3DC2=3DA0>
>> >> > =3DC2=3DA0> Erlang R14B02 (erts-5.8.3) [source] [64-bit] [smp:4:4] [rq=
>> :4]
>> >> > =3DC2=3DA0> [async-threads:4] [hipe] [kernel-poll:true]
>> >> > =3DC2=3DA0>
>> >> > =3DC2=3DA0> Eshell V5.8.3 =3DC2=3DA0(abort with ^G)
>> >> > =3DC2=3DA0> 1> trunc(16#7ffffffffffffdff * 1.0).
>> >> > =3DC2=3DA0> 9223372036854774784
>> >> > =3DC2=3DA0> 2> trunc(16#7ffffffffffffdff * 1.0).
>> >> > =3DC2=3DA0> 9223372036854774784
>> >> > =3DC2=3DA0> 3> trunc(16#7ffffffffffffe00 * 1.0).
>> >> > =3DC2=3DA0> 9223372036854775808
>> >> > =3DC2=3DA0> 4> trunc(16#7ffffffffffffe00 * 1.0).
>> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10086210: detected unhandled FPE a=
>> t
>> >> > =3DC2=3DA0> 0x19223372036854775808
>> >> > =3DC2=3DA0> 5> trunc(16#7ffffffffffffe00 * 1.0).
>> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10086210: detected unhandled FPE a=
>> t
>> >> > =3DC2=3DA0> 0x19223372036854775808
>> >> > =3DC2=3DA0> 6> io:format("~s~n", [os:cmd("uname -a")]).
>> >> > =3DC2=3DA0> Darwin ba.local 10.7.0 Darwin Kernel Version 10.7.0: Sat J=
>> an 29
>> >> > =3DC2=3DA0> 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386
>> >> > =3DC2=3DA0>
>> >> > =3DC2=3DA0> Here's another example:
>> >> > =3DC2=3DA0>
>> >> > =3DC2=3DA0> Erlang R14B02 (erts-5.8.3) [source] [64-bit] [smp:4:4] [rq=
>> :4]
>> >> > =3DC2=3DA0> [async-threads:4] [hipe] [kernel-poll:true]
>> >> > =3DC2=3DA0>
>> >> > =3DC2=3DA0> Eshell V5.8.3 =3DC2=3DA0(abort with ^G)
>> >> > =3DC2=3DA0> 1> <<F/float>> =3D3D <<67,224,0,0,0,0,0,0>>, trunc(F).
>> >> > =3DC2=3DA0> 9223372036854775808
>> >> > =3DC2=3DA0> 2> <<F/float>> =3D3D <<67,224,0,0,0,0,0,0>>, trunc(F).
>> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10083e24: detected unhandled FPE a=
>> t
>> >> > =3DC2=3DA0> 0x19223372036854775808
>> >> > =3DC2=3DA0> 3> <<F/float>> =3D3D <<67,224,0,0,0,0,0,0>>, trunc(F).
>> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10083e24: detected unhandled FPE a=
>> t
>> >> > =3DC2=3DA0> 0x19223372036854775808
>> >> >
>> >> > It means that the code at 0x19223372036854775808 in the
>> >> > Erlang VM needs to use the proper ERTS_FP_CHECK_<foo> macros.
>> >> >
>> >> > Please attach gdb (or whatever debugger Darwin uses) to a running
>> >> > Erlang VM and disassemble the code at 0x19223372036854775808.
>> >> > We need to know the name of the enclosing function, and preferably
>> >> > also the actual instruction sequence that throws the FPE. If gdb
>> >> > can show the exact original source code line then that's even better.
>> >> >
>> >> > If this is in libc rather than the Erlang VM itself, then we need
>> >> > a call trace to identify which code in the VM called out to this
>> >> > FP-throwing code. =3DC2=3DA0For that you should probably plant a break=
>> point
>> >> > at 0x19223372036854775808 and then evaluate one of those Erlang
>> >> > expressions above to trigger the FPE.
>> >> >
>> >>
>> >> Well, it's actually saying 0x1, the result of the trunc is
>> >> 9223372036854775808 =C2=A0and the message string is truncated to 64
>> >> characters which is not enough to show it all. Perhaps the buffer size
>> >> in erts_fp_check_init_error should be adjusted.
>> >
>> > Something in your terminal or email client ate a \r\n sequence after the
>> > 0x1 from erts_fp_check_init_error() making it appear glued together with
>> > the 9223372036854775808 that the erlang prompt printed.
>>
>> Not my terminal or email client, this is a bug in
>> erts_fp_check_init_error. It only allocates a 64 byte buffer for the
>> error message. The pointer address and the \r\n are eaten because the
>> buffer is too small to fit the whole error message. buf[64] is too
>> small... the format string itself is already 57 chars (including the
>> NULL).
>
> Ah yes. I did see your comment about the short buffer but failed
> to connect that with the strange message. The buffer needs to be at
> least (calculating..) 89 bytes, making it 96 bytes should suffice.
>
> This means that my comment about 0x1 and the wrong type SIGFPE
> handler was invalid. (0x1 is used as a fake PC value in that case.)
>
>> Maybe you missed it in my previous email, it's not 0x1, it is
>> 0x10025433. I showed that by breaking at the function that prints the
>> error.
>> Breakpoint 1, erts_fp_check_init_error (fpexnp=3D0x110f2528) at
>> sys/unix/sys_float.c:87
>> 87      {
>> (gdb) p (void*)*fpexnp
>> $1 =3D (void *) 0x10025433
>
> In your previous disassembly that pointed to a cvttsd2siq instruction.
> That can probably throw a SIGFPE, but I see similar code in a build on
> Linux, and there SIGFPE isn't thrown.
>
> If you attach gdb to a freshly started beam instance, let the process
> continue, and evaluate one of those expressions at the Erlang prompt,
> then gdb should wake up with a SIGFPE at that location.  At that point
> dump parts of the SSE2 state with:
>
> print $mxcsr (SSE control and status flags)
> print $xmm1 (the source operand in the failing SSE instruction)
>
> (If the first SIGFPE occurs elsewhere, disassemble that code first, then
> adjust the print $xmm1 to match that instruction's source operand.)

Program received signal EXC_ARITHMETIC, Arithmetic exception.
[Switching to process 14985]
double_to_integer [inlined] () at
/Users/bob/src/otp_src_R14B02/erts/emulator/beam/erl_bif_guard.c:301
301		d = x;            /* trunc */
(gdb) info frame
Stack level 0, frame at 0x10:
 rip = 0x10025433 in trunc_1 (beam/erl_bif_guard.c:301); saved rip 0x10025433
 called by frame at 0x0
 source language c.
 Arglist at unknown address.
 Locals at unknown address, Previous frame's sp in rsp
(gdb) disassemble 0x0000000010025433
[...]
0x0000000010025433 <trunc_1+371>:	cvttsd2siq %xmm1,%rdx
[...]
(gdb) print $mxcsr
$1 = 6433
(gdb) print $xmm1
$2 = {
  v4_float = {0, 0, 448, 0},
  v2_double = {0, 9.2233720368547758e+18},
  v16_int8 = {0, 0, 0, 0, 0, 0, 0, 0, 67, -32, 0, 0, 0, 0, 0, 0},
  v8_int16 = {0, 0, 0, 0, 17376, 0, 0, 0},
  v4_int32 = {0, 0, 1138753536, 0},
  v2_int64 = {0, 4890909195324358656},
  uint128 = 57411
}
(gdb) print $rdx
$3 = 16

-bob



More information about the erlang-bugs mailing list