[erlang-questions] How to diagnose stuck Erlang node
Mon Nov 28 13:37:49 CET 2011
I have created a crash dump on a test machine (using halfword emulator) and
received user_drv in waiting state with
Program counter - 0x0000000002845f00 (user_drv:server_loop/5 + 48)
So it's on the same instruction (but not running)
0000000002845E80: i_func_info_IaaI 0 user_drv server_loop 5
0000000002845EA8: allocate_init_tIy 7 5 y(0)
0000000002845EC0: init_y y(1)
0000000002845ED0: move2_xyxy x(4) y(2) x(3) y(3)
0000000002845EE0: move2_xyxy x(2) y(4) x(1) y(5)
0000000002845EF0: move_ry x(0) y(6)
0000000002845F00: i_loop_rec_fr f(0000000002846C80) x(0)
0000000002845F10: i_select_tuple_arity2_rfAfAf x(0) f(0000000002846C40) 2
f(0000000002845F40) 3 f(0000000002846418)
0000000002845F40: i_get_tuple_element_rPx x(0) 0 x(1)
0000000002846C80: wait_f f(0000000002845F00)
0000000002846C90: badmatch_r x(0)
So it's just a waiting loop. I don'see how the process could be running
when the only ouput for some time was "ALIVE" messages every 15 minutes
Loooks like the only way to see what was going on is to get complete crash
dump, but it was truncated by heart :-\
P.S. It's quite strange that crash dump shows +48
> +48 does not point to an instruction start on a couple of 32-bit systems I
> have access to, so I can not assist you further.
> To get instructions dump named "user_drv.dis" in the beam process working
> directory you can do
> Happy bug-hunting.
> On Mon, 28 Nov 2011 12:01:17 +0200, Kirill Zaborsky <qrilka@REDACTED>
> I'm using halfword emulator on 64bit Ubuntu Server
>> And the process state is not "waiting" but "running". Previous crash dumps
>> show the same program counter value (and user_drv in running state)
>> Kind regards,
>> Kirill Zaborsky
>> 2011/11/28 Dennis Novikov <dennis.novikov@REDACTED>
>> On Mon, 28 Nov 2011 08:44:42 +0200, Kirill Zaborsky <qrilka@REDACTED>
>>> Trying to fins any workaround to this "stuck node" scenario I've
>>>> to R14B04 and turned on "heart".
>>>> But recently the node once again stopped responding. And heart did not
>>>> assume it to be stuck although I could not contact it.
>>>> I've tried to to get a crashdump with 'kill -USR1' but it appeared that
>>>> once again crash dump was truncated. Does heart kills "dead" erlang
>>>> And the only thing that could be seen from the crash dump that the only
>>>> running process was user_drv (just like in previous times) with program
>>>> counter equal to "user_drv:server_loop/5 + 48". Is it possible to find
>>>> what exactly does it stands for?
>>> Waiting on receive in that function. And you are observing this on a
>>> 32-bit VM.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions