[erlang-questions] Investigate an infinite loop on production servers

Vance Shipley vances@REDACTED
Fri May 24 11:05:43 CEST 2013


Have you set the ERL_CRASH_DUMP_SECONDS environment variable?:
   It won't create one unless you set it to a positive value. Set it to 60
or more to be sure it completes.
 On May 24, 2013 1:43 PM, "Morgan Segalis" <msegalis@REDACTED> wrote:

> The problem is that the VM freezes completely, it does not generate a
> crash dump
>
> Is there a way to limit the memory that a VM can allocate, so the server
> is not overwhelmed in order to create a crash dump ?
>
> Le 24 mai 2013 à 02:00, Vance Shipley <vances@REDACTED> a écrit :
>
> Have you analyzed the crash dump file with the crash dump viewer?
>  On May 24, 2013 3:00 AM, "Morgan Segalis" <msegalis@REDACTED> wrote:
>
>> Yeah you got that right ! leaking at a huge rate at some point !
>>
>> - The number of Fd - I don't get close to the max
>> # cat /proc/sys/fs/file-nr
>> 3264    0       6455368
>>
>> - On the production server there is only the erlang node, no other
>> service…
>> The beam.smp was through the roof at 300% CPU and 97% RAM
>> The weird thing is that it got there in a second, I was looking at it
>> when it happens.
>>
>> - It has happened with 2000 connections, 4000 connections, and 10000
>> connections… 5 min after start, 5hours after start.
>>
>> I really can't find a pattern here…and I'm becoming a little desperate.
>>
>> Thank you for your help again.
>>
>> Morgan.
>>
>> Le 23 mai 2013 à 23:20, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a
>> écrit :
>>
>> You system definitely leaking some resources :-/
>>  - Check number of used FD(s) may be you exceeded limit there
>>  - What was overall system memory / cpu utilisation before crash?
>>  - Check how many connections you got before crash, may be you can
>> reproduce it at dev
>>
>> - Dmitry
>>
>> On May 24, 2013, at 12:13 AM, Morgan Segalis <msegalis@REDACTED> wrote:
>>
>> Ok, it finally got into the infinite loop…
>>
>> And of course, the node on which I was running etop could not give me
>> anymore since it got disconnected from the production node.
>>
>> So back to square one… no way to investigate correctly so far :-/
>>
>> Morgan.
>>
>> Le 23 mai 2013 à 16:34, Morgan Segalis <msegalis@REDACTED> a écrit :
>>
>> Yeah that what I'm doing right now, but of course, when I'm monitoring
>> it, it won't crash, only when I sleep !!
>>
>> I get back to the Erlang list as soon as I have more informations about
>> this.
>>
>> Thank you all !
>>
>> Morgan.
>>
>> Le 23 mai 2013 à 16:30, Vance Shipley <vances@REDACTED> a écrit :
>>
>> Keep etop running and capture the output to a file (e.g. etop ... | tee
>> stop.log). After it gets into trouble look back and see what was happening
>> beforehand.
>>  On May 23, 2013 6:16 PM, "Morgan Segalis" <msegalis@REDACTED> wrote:
>>
>>> So I should go back to R15B ?
>>>
>>> erlang:memory() gives me
>>>
>>> [{total,1525779584},
>>>  {processes,1272881427},
>>>  {processes_used,1272789743},
>>>  {system,252898157},
>>>  {atom,372217},
>>>  {atom_used,346096},
>>>  {binary,148093608},
>>>  {code,8274446},
>>>  {ets,1546832}]
>>>
>>>
>>> But keep in mind that right now, there is no infinite loop, or memory
>>> issue at this exact time…
>>> It will be more interesting to have that when the VM is asking for 14GB
>>> of memory, but when it does, the console is unresponsive, so I can't get
>>> anything then.
>>>
>>> Le 23 mai 2013 à 14:39, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a
>>> écrit :
>>>
>>> Right, you do not have many processes. Same time you goes out of memory…
>>>
>>> Unfortunately, I had no time play around with R16B at production…
>>> Could it be some issue with SSL, I re-call there was some complains in
>>> the list?
>>>
>>> I would use entop to spot the process that has either too much
>>> reductions, queue len or heap.
>>> Once you know they pid you can dig more info about them using
>>> erlang:process_info(…) and/or sys:get:status(…)
>>>
>>> BTW, What erlang:memory() says on you production node?
>>>
>>> - Dmitry
>>>
>>> On May 23, 2013, at 3:25 PM, Morgan Segalis <msegalis@REDACTED> wrote:
>>>
>>> No, I was talking about the function I made to investigate which
>>> processes I have created, which gives me this output :
>>>
>>> Dict: {dict,16,16,16,8,80,48,
>>>            {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>>
>>>            {{[[{{connector_serv,init,1},[connector_suprc42,connector,<0.42.0>]}|548]],
>>>              [],
>>>              [[{{supervisor,connector_sup,1},[connector,<0.42.0>]}|3],
>>>
>>>               [{{connector_serv,init,1},[connector_supssl,connector,<0.42.0>]}|1460],
>>>               [{{supervisor,casserl_sup,1},[connector,<0.42.0>]}|1],
>>>               [{{supervisor,pushiphone_sup,1},[connector,<0.42.0>]}|2],
>>>
>>>               [{{pushiphone,init,1},['pushiphone-lite',connector,<0.42.0>]}|3],
>>>               [{{supervisor,clientpool_sup,1},[connector,<0.42.0>]}|1]],
>>>              [],
>>>
>>>              [[{{clientpool,init,1},[clientpool_sup,connector,<0.42.0>]}|1],
>>>               [undefined|4]],
>>>              [],
>>>              [[{{supervisor,connector,1},[<0.42.0>]}|1],
>>>
>>>               [{{casserl_serv,init,1},[casserl_sup,connector,<0.42.0>]}|50]],
>>>              [],[],[],
>>>
>>>              [[{{connector_serv,init,1},[connector_suprc4,connector,<0.42.0>]}|472],
>>>               [{{ssl_connection,init,1},
>>>                 [ssl_connection_sup,ssl_sup,<0.51.0>]}|
>>>                1366],
>>>               [{unknown,unknown}|3]],
>>>              [],[],
>>>
>>>              [[{{pushiphone,init,1},['pushiphone-full',connector,<0.42.0>]}|3]],
>>>              [],
>>>
>>>              [[{{pg2,init,1},[kernel_safe_sup,kernel_sup,<0.10.0>]}|1]]}}}
>>> ok
>>>
>>> I'm very satisfied with supervisor, and I don't think to have the
>>> expertise tweaking it...
>>>
>>> Le 23 mai 2013 à 14:19, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a
>>> écrit :
>>>
>>>
>>> On May 23, 2013, at 1:04 PM, Morgan Segalis <msegalis@REDACTED> wrote:
>>>
>>> I have made a little function a while back, getting all processes and
>>> removing the processes inited at the beginning…
>>>
>>>
>>> Could you please elaborate on that? Why you are not satisfied with
>>> supervisor?
>>>
>>> - Dmitry
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130524/0fc166ec/attachment.htm>


More information about the erlang-questions mailing list