reproduceable livelock (?) in R13B04

Matthias Lang matthias@REDACTED
Thu Nov 18 23:30:03 CET 2010


Hi,

more information: I can't reproduce it with R14B.

I went through git commits and 3e645422ef4 looked promising ("fix
livelock in erts_poll_info_kp()"), but applying it to R13B04 didn't
make the problem go away. Not entirely unexpected; the problem fixed
isn't consistent with the strace I got on MIPS.

This might be a good time to try out git-bisect.

Matt

----------------------------------------------------------------------

On Thursday, November 18, Matthias Lang wrote:
> Hi,
> 
> while chasing a related bug, I found that the program below can get
> the beam VM into a state where it eats CPU but fails to make forward
> progress.
> 
> I expect the program to print _something_ new at one second intervals,
> no matter what. Here's the program:
> 
>    -module(poll).
>    -export([go/0]).
>    
>    go() ->
>        file:delete("/tmp/pid"),
>        spawn(fun() -> os:cmd("sleep 1; echo -n $? > /tmp/pid") end),
>        wait_for_ev(),
>        go().
>    
>    wait_for_ev() ->
>        case file:read_file("/tmp/pid") of
>    	{ok, <<>>} -> io:fwrite("race!\n");
>    	{ok, _} -> io:fwrite(".");
>    	{error, enoent} -> wait_for_ev();
>    	X -> io:fwrite("unexpected: ~p\n", [X])
>        end.
> 
> On my dual-core 64-bit dual core desktop machine (SMP, the program runs as
> expected for a few minutes, printing various things, but then it
> produces no more output. 'top' says it's eating 100% CPU. That's happened
> within three minutes three times in a row now, i.e. it's reproduceable on
> that particular machine.
> 
>   Erlang R13B04 (erts-5.7.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]
> 
>   Linux contorpis.corelatus.se 2.6.32-5-amd64 #1 SMP Sat Oct 30 14:18:21 UTC 2010 x86_64 GNU/Linux
> 
> On my single-core 32-bit laptop, I can't provoke the same hang, i.e. the
> program runs as it should.
> 
> On a single-core MIPS system, I also get the hang. On the MIPS system,
> strace shows me:
> 
> read(8, "", 262144)                     = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 494810314}) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 495844944}) = 0
> poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0,
> events=POLLIN|POLLRDNORM}], 3, 0) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 498175425}) = 0
> read(8, "", 262144)                     = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 501869833}) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 502907388}) = 0
> poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0,
> events=POLLIN|POLLRDNORM}], 3, 0) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 505240833}) = 0
> read(8, "", 262144)                     = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 507228944}) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 508265759}) = 0
> poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0,
> events=POLLIN|POLLRDNORM}], 3, 0) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 513049685}) = 0
> read(8, "", 262144)                     = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 515038870}) = 0
> gettimeofday({1290081697, 234682}, NULL) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 517034759}) = 0
> poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0,
> events=POLLIN|POLLRDNORM}], 3, 0) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 519364425}) = 0
> 
> I can see that FD=8 is /tmp/pid.
> 
> I haven't tried reproducing this on R14B (yet). I can't see anything in
> the R14B release notes which matches my symptoms, OTP-8773 is the closest.
> 
> Is this a known problem?
> 
> Matt


More information about the erlang-bugs mailing list