[erlang-questions] How to diagnose stuck Erlang node

Kirill Zaborsky qrilka@REDACTED
Fri Oct 28 10:10:05 CEST 2011


Just 2 days passed and Erlang node got stuck once again.
This time I killed it with SIGUSR1 and received a crash dump.
Checking all the logs on host didn't bring any hints where the problem may
be.
And in crash dump the only suspicious thing is that user_drv has message
queue length equal to 7550. The program counter points
to user_drv:server_loop/5 + 48 - is there any way to get info what
instruction in the source code it corresponds to?
BTW crash dump viewer says that crash dump was truncated is there any way to
get full crash dump?
The system is running R14B03 if it matters.
Any advices are welcomed.

Kind regards,
Kirill Zaborsky

2011/10/26 Kirill Zaborsky <qrilka@REDACTED>

> Recently we have found some problems with our Erlang application:
> For some time system works ok (e.g. before today it run with no problems
> for at least 17 days). Then something happens and it "stucks". It does not
> repond to pings, http interface (mochiweb) gives no replies. The only thing
> that can be observed is standard "ALIVE" message sent to stdout every 15
> minutes when there is no output to stdout. Messages from logs show nothing
> special before logging stops.
> The only thing I could do is just kill the emulator. That gives me
> opportunity to restart the system but gives no additional information about
> the roots of the problem.
> On JVM it's possible to get program thread dump (using QUIT signal) is
> there some ways to "manually" force Erlang emulator to produce crash dump
> without using erlang:halt/1?
> Are there some other ways to diagnose this problem which I should take a
> look at?
>
> Kind regars,
> Kirill Zaborksy
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111028/51aa99fb/attachment.htm>


More information about the erlang-questions mailing list