<div class="gmail_quote">On Wed, May 4, 2011 at 7:03 AM, Bob Ippolito <span dir="ltr"><<a href="mailto:bob@redivi.com">bob@redivi.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="HOEnZb"><div class="h5">On Wed, May 4, 2011 at 1:14 AM, Mikael Pettersson <<a href="mailto:mikpe@it.uu.se">mikpe@it.uu.se</a>> wrote:<br>

> On Tue, 3 May 2011 16:48:20 -0700, Bob Ippolito <<a href="mailto:bob@redivi.com">bob@redivi.com</a>> wrote:<br>

>> On Tue, May 3, 2011 at 3:35 PM, Mikael Pettersson <<a href="mailto:mikpe@it.uu.se">mikpe@it.uu.se</a>> wrote:<br>

>> > On Tue, 3 May 2011 07:18:34 -0700, Bob Ippolito <<a href="mailto:bob@redivi.com">bob@redivi.com</a>> wrote:<br>

>> >> On Tue, May 3, 2011 at 1:04 AM, Mikael Pettersson <<a href="mailto:mikpe@it.uu.se">mikpe@it.uu.se</a>> wrote=<br>

>> :<br>

>> >> > Bob Ippolito writes:<br>

>> >> > =3DC2=3DA0> I only see this error on Mac OS X. I have not been able to=<br>

>>  reprod=3D<br>

>> >> uce in Linux.<br>

>> >> > =3DC2=3DA0><br>

>> >> > =3DC2=3DA0> Here's an example, any number larger than 16#7ffffffffffff=<br>

>> e00 wil=3D<br>

>> >> l<br>

>> >> > =3DC2=3DA0> cause this error.<br>

>> >> > =3DC2=3DA0><br>

>> >> > =3DC2=3DA0> Erlang R14B02 (erts-5.8.3) [source] [64-bit] [smp:4:4] [rq=<br>

>> :4]<br>

>> >> > =3DC2=3DA0> [async-threads:4] [hipe] [kernel-poll:true]<br>

>> >> > =3DC2=3DA0><br>

>> >> > =3DC2=3DA0> Eshell V5.8.3 =3DC2=3DA0(abort with ^G)<br>

>> >> > =3DC2=3DA0> 1> trunc(16#7ffffffffffffdff * 1.0).<br>

>> >> > =3DC2=3DA0> 9223372036854774784<br>

>> >> > =3DC2=3DA0> 2> trunc(16#7ffffffffffffdff * 1.0).<br>

>> >> > =3DC2=3DA0> 9223372036854774784<br>

>> >> > =3DC2=3DA0> 3> trunc(16#7ffffffffffffe00 * 1.0).<br>

>> >> > =3DC2=3DA0> 9223372036854775808<br>

>> >> > =3DC2=3DA0> 4> trunc(16#7ffffffffffffe00 * 1.0).<br>

>> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10086210: detected unhandled FPE a=<br>

>> t<br>

>> >> > =3DC2=3DA0> 0x19223372036854775808<br>

>> >> > =3DC2=3DA0> 5> trunc(16#7ffffffffffffe00 * 1.0).<br>

>> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10086210: detected unhandled FPE a=<br>

>> t<br>

>> >> > =3DC2=3DA0> 0x19223372036854775808<br>

>> >> > =3DC2=3DA0> 6> io:format("~s~n", [os:cmd("uname -a")]).<br>

>> >> > =3DC2=3DA0> Darwin ba.local 10.7.0 Darwin Kernel Version 10.7.0: Sat J=<br>

>> an 29<br>

>> >> > =3DC2=3DA0> 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386<br>

>> >> > =3DC2=3DA0><br>

>> >> > =3DC2=3DA0> Here's another example:<br>

>> >> > =3DC2=3DA0><br>

>> >> > =3DC2=3DA0> Erlang R14B02 (erts-5.8.3) [source] [64-bit] [smp:4:4] [rq=<br>

>> :4]<br>

>> >> > =3DC2=3DA0> [async-threads:4] [hipe] [kernel-poll:true]<br>

>> >> > =3DC2=3DA0><br>

>> >> > =3DC2=3DA0> Eshell V5.8.3 =3DC2=3DA0(abort with ^G)<br>

>> >> > =3DC2=3DA0> 1> <<F/float>> =3D3D <<67,224,0,0,0,0,0,0>>, trunc(F).<br>

>> >> > =3DC2=3DA0> 9223372036854775808<br>

>> >> > =3DC2=3DA0> 2> <<F/float>> =3D3D <<67,224,0,0,0,0,0,0>>, trunc(F).<br>

>> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10083e24: detected unhandled FPE a=<br>

>> t<br>

>> >> > =3DC2=3DA0> 0x19223372036854775808<br>

>> >> > =3DC2=3DA0> 3> <<F/float>> =3D3D <<67,224,0,0,0,0,0,0>>, trunc(F).<br>

>> >> > =3DC2=3DA0> ERTS_FP_CHECK_INIT at 0x10083e24: detected unhandled FPE a=<br>

>> t<br>

>> >> > =3DC2=3DA0> 0x19223372036854775808<br>

>> >> ><br>

>> >> > It means that the code at 0x19223372036854775808 in the<br>

>> >> > Erlang VM needs to use the proper ERTS_FP_CHECK_<foo> macros.<br>

>> >> ><br>

>> >> > Please attach gdb (or whatever debugger Darwin uses) to a running<br>

>> >> > Erlang VM and disassemble the code at 0x19223372036854775808.<br>

>> >> > We need to know the name of the enclosing function, and preferably<br>

>> >> > also the actual instruction sequence that throws the FPE. If gdb<br>

>> >> > can show the exact original source code line then that's even better.<br>

>> >> ><br>

>> >> > If this is in libc rather than the Erlang VM itself, then we need<br>

>> >> > a call trace to identify which code in the VM called out to this<br>

>> >> > FP-throwing code. =3DC2=3DA0For that you should probably plant a break=<br>

>> point<br>

>> >> > at 0x19223372036854775808 and then evaluate one of those Erlang<br>

>> >> > expressions above to trigger the FPE.<br>

>> >> ><br>

>> >><br>

>> >> Well, it's actually saying 0x1, the result of the trunc is<br>

>> >> 9223372036854775808 =C2=A0and the message string is truncated to 64<br>

>> >> characters which is not enough to show it all. Perhaps the buffer size<br>

>> >> in erts_fp_check_init_error should be adjusted.<br>

>> ><br>

>> > Something in your terminal or email client ate a \r\n sequence after the<br>

>> > 0x1 from erts_fp_check_init_error() making it appear glued together with<br>

>> > the 9223372036854775808 that the erlang prompt printed.<br>

>><br>

>> Not my terminal or email client, this is a bug in<br>

>> erts_fp_check_init_error. It only allocates a 64 byte buffer for the<br>

>> error message. The pointer address and the \r\n are eaten because the<br>

>> buffer is too small to fit the whole error message. buf[64] is too<br>

>> small... the format string itself is already 57 chars (including the<br>

>> NULL).<br>

><br>

> Ah yes. I did see your comment about the short buffer but failed<br>

> to connect that with the strange message. The buffer needs to be at<br>

> least (calculating..) 89 bytes, making it 96 bytes should suffice.<br>

><br>

> This means that my comment about 0x1 and the wrong type SIGFPE<br>

> handler was invalid. (0x1 is used as a fake PC value in that case.)<br>

><br>

>> Maybe you missed it in my previous email, it's not 0x1, it is<br>

>> 0x10025433. I showed that by breaking at the function that prints the<br>

>> error.<br>

>> Breakpoint 1, erts_fp_check_init_error (fpexnp=3D0x110f2528) at<br>

>> sys/unix/sys_float.c:87<br>

>> 87      {<br>

>> (gdb) p (void*)*fpexnp<br>

>> $1 =3D (void *) 0x10025433<br>

><br>

> In your previous disassembly that pointed to a cvttsd2siq instruction.<br>

> That can probably throw a SIGFPE, but I see similar code in a build on<br>

> Linux, and there SIGFPE isn't thrown.<br>

><br>

> If you attach gdb to a freshly started beam instance, let the process<br>

> continue, and evaluate one of those expressions at the Erlang prompt,<br>

> then gdb should wake up with a SIGFPE at that location.  At that point<br>

> dump parts of the SSE2 state with:<br>

><br>

> print $mxcsr (SSE control and status flags)<br>

> print $xmm1 (the source operand in the failing SSE instruction)<br>

><br>

> (If the first SIGFPE occurs elsewhere, disassemble that code first, then<br>

> adjust the print $xmm1 to match that instruction's source operand.)<br>

<br>

</div></div>Program received signal EXC_ARITHMETIC, Arithmetic exception.<br>

[Switching to process 14985]<br>

<div class="im">double_to_integer [inlined] () at<br>

/Users/bob/src/otp_src_R14B02/erts/emulator/beam/erl_bif_guard.c:301<br>

301             d = x;            /* trunc */<br>

</div>(gdb) info frame<br>

Stack level 0, frame at 0x10:<br>

 rip = 0x10025433 in trunc_1 (beam/erl_bif_guard.c:301); saved rip 0x10025433<br>

 called by frame at 0x0<br>

 source language c.<br>

 Arglist at unknown address.<br>

 Locals at unknown address, Previous frame's sp in rsp<br>

(gdb) disassemble 0x0000000010025433<br>

[...]<br>

0x0000000010025433 <trunc_1+371>:       cvttsd2siq %xmm1,%rdx<br>

[...]<br>

(gdb) print $mxcsr<br>

$1 = 6433<br>

(gdb) print $xmm1<br>

$2 = {<br>

  v4_float = {0, 0, 448, 0},<br>

  v2_double = {0, 9.2233720368547758e+18},<br>

  v16_int8 = {0, 0, 0, 0, 0, 0, 0, 0, 67, -32, 0, 0, 0, 0, 0, 0},<br>

  v8_int16 = {0, 0, 0, 0, 17376, 0, 0, 0},<br>

  v4_int32 = {0, 0, 1138753536, 0},<br>

  v2_int64 = {0, 4890909195324358656},<br>

  uint128 = 57411<br>

}<br>

(gdb) print $rdx<br>

$3 = 16</blockquote><div><br></div><div>For the archives - just saw this again on R14B04, Mac OS X.</div><div><br></div><div>It appears to be fixed in R15A (git 1c99516 from a couple weeks ago) though.</div><div><br></div>

<div>-bob</div><div><br></div><div> </div></div>