[erlang-questions] Investigate an infinite loop on production servers

Vance Shipley vances@REDACTED
Fri May 24 11:27:43 CEST 2013


It "freezes"? So it hasn't crashed at all.  In that case you just need to
be more patient and wait for it to either crash, and leave a crash dump, or
output to etop. Possibly setting process priorities would help. Give the
suspicious ones low priority.
If it's CPU resources which are being depleted you would want to observe
which have the most reductions. Use stop to order by reductions and see
who's the busiest.

Another way would be to run a debug emulator and interrupt  it while it's
frozen. Then inspect the backtrace to see what it has been doing.
On May 24, 2013 1:43 PM, "Morgan Segalis" <msegalis@REDACTED> wrote:

> The problem is that the VM freezes completely, it does not generate a
> crash dump
>
> Is there a way to limit the memory that a VM can allocate, so the server
> is not overwhelmed in order to create a crash dump ?
>
> Le 24 mai 2013 à 02:00, Vance Shipley <vances@REDACTED> a écrit :
>
> Have you analyzed the crash dump file with the crash dump viewer?
>  On May 24, 2013 3:00 AM, "Morgan Segalis" <msegalis@REDACTED> wrote:
>
>> Yeah you got that right ! leaking at a huge rate at some point !
>>
>> - The number of Fd - I don't get close to the max
>> # cat /proc/sys/fs/file-nr
>> 3264    0       6455368
>>
>> - On the production server there is only the erlang node, no other
>> service…
>> The beam.smp was through the roof at 300% CPU and 97% RAM
>> The weird thing is that it got there in a second, I was looking at it
>> when it happens.
>>
>> - It has happened with 2000 connections, 4000 connections, and 10000
>> connections… 5 min after start, 5hours after start.
>>
>> I really can't find a pattern here…and I'm becoming a little desperate.
>>
>> Thank you for your help again.
>>
>> Morgan.
>>
>> Le 23 mai 2013 à 23:20, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a
>> écrit :
>>
>> You system definitely leaking some resources :-/
>>  - Check number of used FD(s) may be you exceeded limit there
>>  - What was overall system memory / cpu utilisation before crash?
>>  - Check how many connections you got before crash, may be you can
>> reproduce it at dev
>>
>> - Dmitry
>>
>> On May 24, 2013, at 12:13 AM, Morgan Segalis <msegalis@REDACTED> wrote:
>>
>> Ok, it finally got into the infinite loop…
>>
>> And of course, the node on which I was running etop could not give me
>> anymore since it got disconnected from the production node.
>>
>> So back to square one… no way to investigate correctly so far :-/
>>
>> Morgan.
>>
>> Le 23 mai 2013 à 16:34, Morgan Segalis <msegalis@REDACTED> a écrit :
>>
>> Yeah that what I'm doing right now, but of course, when I'm monitoring
>> it, it won't crash, only when I sleep !!
>>
>> I get back to the Erlang list as soon as I have more informations about
>> this.
>>
>> Thank you all !
>>
>> Morgan.
>>
>> Le 23 mai 2013 à 16:30, Vance Shipley <vances@REDACTED> a écrit :
>>
>> Keep etop running and capture the output to a file (e.g. etop ... | tee
>> stop.log). After it gets into trouble look back and see what was happening
>> beforehand.
>>  On May 23, 2013 6:16 PM, "Morgan Segalis" <msegalis@REDACTED> wrote:
>>
>>> So I should go back to R15B ?
>>>
>>> erlang:memory() gives me
>>>
>>> [{total,1525779584},
>>>  {processes,1272881427},
>>>  {processes_used,1272789743},
>>>  {system,252898157},
>>>  {atom,372217},
>>>  {atom_used,346096},
>>>  {binary,148093608},
>>>  {code,8274446},
>>>  {ets,1546832}]
>>>
>>>
>>> But keep in mind that right now, there is no infinite loop, or memory
>>> issue at this exact time…
>>> It will be more interesting to have that when the VM is asking for 14GB
>>> of memory, but when it does, the console is unresponsive, so I can't get
>>> anything then.
>>>
>>> Le 23 mai 2013 à 14:39, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a
>>> écrit :
>>>
>>> Right, you do not have many processes. Same time you goes out of memory…
>>>
>>> Unfortunately, I had no time play around with R16B at production…
>>> Could it be some issue with SSL, I re-call there was some complains in
>>> the list?
>>>
>>> I would use entop to spot the process that has either too much
>>> reductions, queue len or heap.
>>> Once you know they pid you can dig more info about them using
>>> erlang:process_info(…) and/or sys:get:status(…)
>>>
>>> BTW, What erlang:memory() says on you production node?
>>>
>>> - Dmitry
>>>
>>> On May 23, 2013, at 3:25 PM, Morgan Segalis <msegalis@REDACTED> wrote:
>>>
>>> No, I was talking about the function I made to investigate which
>>> processes I have created, which gives me this output :
>>>
>>> Dict: {dict,16,16,16,8,80,48,
>>>            {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>>
>>>            {{[[{{connector_serv,init,1},[connector_suprc42,connector,<0.42.0>]}|548]],
>>>              [],
>>>              [[{{supervisor,connector_sup,1},[connector,<0.42.0>]}|3],
>>>
>>>               [{{connector_serv,init,1},[connector_supssl,connector,<0.42.0>]}|1460],
>>>               [{{supervisor,casserl_sup,1},[connector,<0.42.0>]}|1],
>>>               [{{supervisor,pushiphone_sup,1},[connector,<0.42.0>]}|2],
>>>
>>>               [{{pushiphone,init,1},['pushiphone-lite',connector,<0.42.0>]}|3],
>>>               [{{supervisor,clientpool_sup,1},[connector,<0.42.0>]}|1]],
>>>              [],
>>>
>>>              [[{{clientpool,init,1},[clientpool_sup,connector,<0.42.0>]}|1],
>>>               [undefined|4]],
>>>              [],
>>>              [[{{supervisor,connector,1},[<0.42.0>]}|1],
>>>
>>>               [{{casserl_serv,init,1},[casserl_sup,connector,<0.42.0>]}|50]],
>>>              [],[],[],
>>>
>>>              [[{{connector_serv,init,1},[connector_suprc4,connector,<0.42.0>]}|472],
>>>               [{{ssl_connection,init,1},
>>>                 [ssl_connection_sup,ssl_sup,<0.51.0>]}|
>>>                1366],
>>>               [{unknown,unknown}|3]],
>>>              [],[],
>>>
>>>              [[{{pushiphone,init,1},['pushiphone-full',connector,<0.42.0>]}|3]],
>>>              [],
>>>
>>>              [[{{pg2,init,1},[kernel_safe_sup,kernel_sup,<0.10.0>]}|1]]}}}
>>> ok
>>>
>>> I'm very satisfied with supervisor, and I don't think to have the
>>> expertise tweaking it...
>>>
>>> Le 23 mai 2013 à 14:19, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a
>>> écrit :
>>>
>>>
>>> On May 23, 2013, at 1:04 PM, Morgan Segalis <msegalis@REDACTED> wrote:
>>>
>>> I have made a little function a while back, getting all processes and
>>> removing the processes inited at the beginning…
>>>
>>>
>>> Could you please elaborate on that? Why you are not satisfied with
>>> supervisor?
>>>
>>> - Dmitry
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130524/fcaede80/attachment.htm>


More information about the erlang-questions mailing list