[erlang-questions] How to diagnose stuck Erlang node

Kirill Zaborsky qrilka@REDACTED
Fri Oct 28 14:30:55 CEST 2011


Nothing worth to mention: net_kernel just in waiting state. In dump there
are only 3 processes not in waiting state: user_drv runng, and 2 processes
in scheduled state in erlang:apply (1 in unicode:ml_map/3 and other in
cl_consumer:consumer_wait/0 called from OSERL)

Kind regards,
Kirill Zaborsky

2011/10/28 Ahmed Omar <spawn.think@REDACTED>

> I saw a similar behavior but with rex process (rpc server) having a very
> long queue. For your case i wonder what's loading the user_drv process, but
> as you mentioned the crash dump was truncated.
> Do you see in the crash dump anything about net_kernel process?
>
>
> On Fri, Oct 28, 2011 at 1:41 PM, Kirill Zaborsky <qrilka@REDACTED> wrote:
>
>> It's a server to collect information from GPS trackers (sent by SMS or
>> HTTP/GPRS), uses Yaws to get information by HTTP. Processed information goes
>> to PostgreSQL (epgsql is used). Then this info could be shown with
>> qooxdoo/mapserver, webinterface backend is done with mochiweb.
>> There is no custom NIFs of  C nodes so I'm not sure how could this
>> application hang Erlang emulator.
>> On other Erlang system done by our company we have experienced similar
>> behaviour when user was connected to the server shell (using ndetool's
>> attach command i.e. to_erl program) and then ssh session was broken, Long
>> message queue for user_drv may be connected to something like that. The
>> problem is that for the system I'm trying to get a diagnose there was no any
>> shell/console connected. So It's unclear what could trigger such problem.
>>
>> Kind regards,
>> Kirill Zaborsky
>>
>> 2011/10/28 Ahmed Omar <spawn.think@REDACTED>
>>
>>> Maybe providing some information about what your application is doing
>>> might help?
>>>
>>>
>>> On Fri, Oct 28, 2011 at 10:47 AM, Kirill Zaborsky <qrilka@REDACTED>wrote:
>>>
>>>> About message queue crash dump viewer says "The dump is truncated, no
>>>> data available", so I've got no more infromation :-\
>>>> epmd -names showed the node running but I could not contact it.
>>>>
>>>> Kind regards,
>>>> Kirill Zaborsky
>>>>
>>>>
>>>> 2011/10/28 Ahmed Omar <spawn.think@REDACTED>
>>>>
>>>>> Are you able to expand message queue of user_drv process? That might
>>>>> give some information.
>>>>> Did you check epmd status before dumping?
>>>>>
>>>>> On Fri, Oct 28, 2011 at 10:10 AM, Kirill Zaborsky <qrilka@REDACTED>wrote:
>>>>>
>>>>>> Just 2 days passed and Erlang node got stuck once again.
>>>>>> This time I killed it with SIGUSR1 and received a crash dump.
>>>>>> Checking all the logs on host didn't bring any hints where the problem
>>>>>> may be.
>>>>>> And in crash dump the only suspicious thing is that user_drv has
>>>>>> message queue length equal to 7550. The program counter points
>>>>>> to user_drv:server_loop/5 + 48 - is there any way to get info what
>>>>>> instruction in the source code it corresponds to?
>>>>>> BTW crash dump viewer says that crash dump was truncated is there any
>>>>>> way to get full crash dump?
>>>>>> The system is running R14B03 if it matters.
>>>>>> Any advices are welcomed.
>>>>>>
>>>>>> Kind regards,
>>>>>> Kirill Zaborsky
>>>>>>
>>>>>> 2011/10/26 Kirill Zaborsky <qrilka@REDACTED>
>>>>>>
>>>>>>> Recently we have found some problems with our Erlang application:
>>>>>>> For some time system works ok (e.g. before today it run with no
>>>>>>> problems for at least 17 days). Then something happens and it "stucks". It
>>>>>>> does not repond to pings, http interface (mochiweb) gives no replies. The
>>>>>>> only thing that can be observed is standard "ALIVE" message sent to stdout
>>>>>>> every 15 minutes when there is no output to stdout. Messages from logs show
>>>>>>> nothing special before logging stops.
>>>>>>> The only thing I could do is just kill the emulator. That gives me
>>>>>>> opportunity to restart the system but gives no additional information about
>>>>>>> the roots of the problem.
>>>>>>> On JVM it's possible to get program thread dump (using QUIT signal)
>>>>>>> is there some ways to "manually" force Erlang emulator to produce crash dump
>>>>>>> without using erlang:halt/1?
>>>>>>> Are there some other ways to diagnose this problem which I should
>>>>>>> take a look at?
>>>>>>>
>>>>>>> Kind regars,
>>>>>>> Kirill Zaborksy
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> erlang-questions mailing list
>>>>>> erlang-questions@REDACTED
>>>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> - Ahmed Omar
>>>>> http://nl.linkedin.com/in/adiaa
>>>>> Follow me on twitter
>>>>> @spawn_think <http://twitter.com/#!/spawn_think>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> - Ahmed Omar
>>> http://nl.linkedin.com/in/adiaa
>>> Follow me on twitter
>>> @spawn_think <http://twitter.com/#!/spawn_think>
>>>
>>>
>>
>
>
> --
> Best Regards,
> - Ahmed Omar
> http://nl.linkedin.com/in/adiaa
> Follow me on twitter
> @spawn_think <http://twitter.com/#!/spawn_think>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111028/866f0bba/attachment.htm>


More information about the erlang-questions mailing list