[erlang-bugs] infinite loop when beam.smp compiled with -O2 on debian lenny

Chetan Ahuja chetan.ahuja@REDACTED
Tue May 4 01:43:27 CEST 2010


Mikeal,

   Thanks a lot for that catch. I think that's it. Just did recompiles with
your patch (with -O2) and the body of the loop now shows up in the generated
code and the trivial spin loop is gone.

 I got blindsided by the optimizer completely  eliminating the body of the
loop, due to which I couldn't  even see urbqp on the stack at all !! This
 led me to the assumption that the surrounding macro
(ERTS_POLL_USE_UPDATE_REQUESTS_QUEUE)  was perhaps undefined and that loop
wasn't even compiled in. Yet another  strike against coding C in
pre-processor macros.

  Overall,  it's a  big relief to know that our standard install of gcc is
not generating such obviously buggy code.  I look forward to seeing the
erts_poll_info fix in an upcoming git version.

Thanks a lot once again
Chetan



On Mon, May 3, 2010 at 2:54 PM, Mikael Pettersson <mikpe@REDACTED> wrote:

> Chetan Ahuja writes:
>  > Hi,
>  >
>  >   We hit a bug while running rabbitmq where the beam.smp process was
> stuck
>  > in a tight loop in the erts_poll_info method.
>  > The process was eating up 100% of exactly one core (on a multi core box)
> and
>  > rabbitmq was dysfunctional.  Unfortunately
>  > I could not create a  small test case to reproduce this condition but it
>  > would happen quite frequently while rabbitmq was in
>  > operation.
>  >
>  > The C code for the function didn't provide any hints on what would have
> been
>  > spinning in that function
>  > (first time looking at  this codebase though). Finally looking through
> the
>  >  disassembly in gdb,  (at the point of where our process was spinning) I
> saw
>  > the following  lines in the
>  > erts_poll_info_kp method:
>  >
>  >
>  > 0x00000000004f0fe9 <erts_poll_info_kp+185>:     nopl   0x0(%rax)
>  > 0x00000000004f0ff0 <erts_poll_info_kp+192>:     jmp    0x4f0fe9
>  > <erts_poll_info_kp+185>
>  >
>  > (Similar assembly code  can be seen  when  the KERNEL_POLL  option is
>  > disabled.)
>  >
>  >  Clearly the above will trivially spin forever anytime we get into that
>  > codepath.  The above
>  > looks suspiciously like some code got optimized out by the compiler
> leaving
>  > the crazy
>  > loop code.
>  >
>  > So I compiled with -O1 and then with no optimization at all.   Withe
> -O1, I
>  > saw a
>  > a weird jmp insruction jumping to it's own address:
>  >
>  > 0x0000000000517102 <erts_poll_info_kp+60>:      jmp    0x517102
>  > <erts_poll_info_kp+60>
>  >
>  > With no optimization,   any of those trivial spins did not exist but I
>  > didn't analyze the unoptimized
>  > code enough to say whether  it can be proven to have an infinite loop
> (i.e.,
>  > whether the optimizing
>  > compiler is simply doing it's job vs. this being a compiler bug).
>  >
>  > Anyway, this problem exists at least since  erlang-base_12.b.3-dfsg
> debian
>  > package version and has been
>  > verified to exists in the  github version as of today.
>  >
>  >
>  >  Her'es the gcc  and  debian version info:
>  >  $ gcc --version
>  > gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2
>  > Copyright (C) 2008 Free Software Foundation, Inc.
>
> I looked at the procedure in question (not so easy to locate due to
> some "creative" C preprocessor abuse), and noticed an obvious bug:
> there's a loop over a linked list that forgets to actually advance
> the node pointer to the next element. When optimizing, gcc will notice
> that the loop doesn't terminate, omit the body of the loop (the
> calculations are dead), which will result in the type of object code
> shown above. Thus, it's an Erlang VM bug not a gcc miscompilation.
>
> Try the patch below and let us know if it solves your problem.
>
> /Mikael
>
> --- otp_src_R13B03/erts/emulator/sys/common/erl_poll.c.~1~      2009-03-12
> 13:16:29.000000000 +0100
> +++ otp_src_R13B03/erts/emulator/sys/common/erl_poll.c  2010-05-03
> 23:41:32.000000000 +0200
> @@ -2404,6 +2404,7 @@ ERTS_POLL_EXPORT(erts_poll_info)(ErtsPol
>        while (urqbp) {
>            size += sizeof(ErtsPollSetUpdateRequestsBlock);
>            pending_updates += urqbp->len;
> +           urqbp = urqbp->next;
>        }
>     }
>  #endif
>


More information about the erlang-bugs mailing list