reproduceable livelock (?) in R13B04
Matthias Lang
matthias@REDACTED
Thu Nov 18 23:30:03 CET 2010
Hi,
more information: I can't reproduce it with R14B.
I went through git commits and 3e645422ef4 looked promising ("fix
livelock in erts_poll_info_kp()"), but applying it to R13B04 didn't
make the problem go away. Not entirely unexpected; the problem fixed
isn't consistent with the strace I got on MIPS.
This might be a good time to try out git-bisect.
Matt
----------------------------------------------------------------------
On Thursday, November 18, Matthias Lang wrote:
> Hi,
>
> while chasing a related bug, I found that the program below can get
> the beam VM into a state where it eats CPU but fails to make forward
> progress.
>
> I expect the program to print _something_ new at one second intervals,
> no matter what. Here's the program:
>
> -module(poll).
> -export([go/0]).
>
> go() ->
> file:delete("/tmp/pid"),
> spawn(fun() -> os:cmd("sleep 1; echo -n $? > /tmp/pid") end),
> wait_for_ev(),
> go().
>
> wait_for_ev() ->
> case file:read_file("/tmp/pid") of
> {ok, <<>>} -> io:fwrite("race!\n");
> {ok, _} -> io:fwrite(".");
> {error, enoent} -> wait_for_ev();
> X -> io:fwrite("unexpected: ~p\n", [X])
> end.
>
> On my dual-core 64-bit dual core desktop machine (SMP, the program runs as
> expected for a few minutes, printing various things, but then it
> produces no more output. 'top' says it's eating 100% CPU. That's happened
> within three minutes three times in a row now, i.e. it's reproduceable on
> that particular machine.
>
> Erlang R13B04 (erts-5.7.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]
>
> Linux contorpis.corelatus.se 2.6.32-5-amd64 #1 SMP Sat Oct 30 14:18:21 UTC 2010 x86_64 GNU/Linux
>
> On my single-core 32-bit laptop, I can't provoke the same hang, i.e. the
> program runs as it should.
>
> On a single-core MIPS system, I also get the hang. On the MIPS system,
> strace shows me:
>
> read(8, "", 262144) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 494810314}) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 495844944}) = 0
> poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0,
> events=POLLIN|POLLRDNORM}], 3, 0) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 498175425}) = 0
> read(8, "", 262144) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 501869833}) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 502907388}) = 0
> poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0,
> events=POLLIN|POLLRDNORM}], 3, 0) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 505240833}) = 0
> read(8, "", 262144) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 507228944}) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 508265759}) = 0
> poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0,
> events=POLLIN|POLLRDNORM}], 3, 0) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 513049685}) = 0
> read(8, "", 262144) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 515038870}) = 0
> gettimeofday({1290081697, 234682}, NULL) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 517034759}) = 0
> poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0,
> events=POLLIN|POLLRDNORM}], 3, 0) = 0
> clock_gettime(CLOCK_MONOTONIC, {56121, 519364425}) = 0
>
> I can see that FD=8 is /tmp/pid.
>
> I haven't tried reproducing this on R14B (yet). I can't see anything in
> the R14B release notes which matches my symptoms, OTP-8773 is the closest.
>
> Is this a known problem?
>
> Matt
More information about the erlang-bugs
mailing list