Thanks, dennis,<div>I have created a crash dump on a test machine (using halfword emulator) and received user_drv in waiting state with</div><div>Program counter - 0x0000000002845f00 (user_drv:server_loop/5 + 48)</div><div>
So it's on the same instruction (but not running)</div><div><br></div><div>Disassembly shows:</div><div>-----------------</div><div><div>0000000002845E80: i_func_info_IaaI 0 user_drv server_loop 5 </div><div>0000000002845EA8: allocate_init_tIy 7 5 y(0) </div>
<div>0000000002845EC0: init_y y(1) </div><div>0000000002845ED0: move2_xyxy x(4) y(2) x(3) y(3) </div><div>0000000002845EE0: move2_xyxy x(2) y(4) x(1) y(5) </div><div>0000000002845EF0: move_ry x(0) y(6) </div><div>0000000002845F00: i_loop_rec_fr f(0000000002846C80) x(0) </div>
<div>0000000002845F10: i_select_tuple_arity2_rfAfAf x(0) f(0000000002846C40) 2 f(0000000002845F40) 3 f(0000000002846418) </div><div>0000000002845F40: i_get_tuple_element_rPx x(0) 0 x(1) </div><div>.....</div><div><div>0000000002846C80: wait_f f(0000000002845F00) </div>
<div>0000000002846C90: badmatch_r x(0) </div></div><div>-----------------</div><div>So it's just a waiting loop. I don'see how the process could be running when the only ouput for some time was "ALIVE" messages every 15 minutes from run_erl.</div>
<div>Loooks like the only way to see what was going on is to get complete crash dump, but it was truncated by heart :-\</div><div><br></div><div>P.S. It's quite strange that crash dump shows +48</div><br><div class="gmail_quote">
2011/11/28 <span dir="ltr"><<a href="mailto:dennis.novikov@gmail.com">dennis.novikov@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">+48 does not point to an instruction start on a couple of 32-bit systems I have access to, so I can not assist you further.<br>
<br>
To get instructions dump named "user_drv.dis" in the beam process working directory you can do<br>
<br>
erts_debug:df(user_drv).<br>
<br>
Happy bug-hunting.<div class="HOEnZb"><div class="h5"><br>
<br>
<br>
On Mon, 28 Nov 2011 12:01:17 +0200, Kirill Zaborsky <<a href="mailto:qrilka@gmail.com" target="_blank">qrilka@gmail.com</a>> wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I'm using halfword emulator on 64bit Ubuntu Server<br>
And the process state is not "waiting" but "running". Previous crash dumps<br>
show the same program counter value (and user_drv in running state)<br>
<br>
Kind regards,<br>
Kirill Zaborsky<br>
<br>
<br>
2011/11/28 Dennis Novikov <<a href="mailto:dennis.novikov@gmail.com" target="_blank">dennis.novikov@gmail.com</a>><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Mon, 28 Nov 2011 08:44:42 +0200, Kirill Zaborsky <<a href="mailto:qrilka@gmail.com" target="_blank">qrilka@gmail.com</a>><br>
wrote:<br>
<br>
Trying to fins any workaround to this "stuck node" scenario I've upgraded<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
to R14B04 and turned on "heart".<br>
But recently the node once again stopped responding. And heart did not<br>
assume it to be stuck although I could not contact it.<br>
I've tried to to get a crashdump with 'kill -USR1' but it appeared that<br>
once again crash dump was truncated. Does heart kills "dead" erlang node?<br>
And the only thing that could be seen from the crash dump that the only<br>
running process was user_drv (just like in previous times) with program<br>
counter equal to "user_drv:server_loop/5 + 48". Is it possible to find out<br>
what exactly does it stands for?<br>
<br>
</blockquote>
<br>
Waiting on receive in that function. And you are observing this on a<br>
32-bit VM.<br>
<br>
--<br>
WBR,<br>
DN<br>
<br>
</blockquote></blockquote>
<br>
<br></div></div><span class="HOEnZb"><font color="#888888">
-- <br>
WBR,<br>
DN<br>
</font></span></blockquote></div><br></div>