[erlang-questions] Investigate an infinite loop on production servers

Morgan Segalis msegalis@REDACTED
Thu May 23 23:30:10 CEST 2013


Yeah you got that right ! leaking at a huge rate at some point !

- The number of Fd - I don't get close to the max
# cat /proc/sys/fs/file-nr
3264    0       6455368

- On the production server there is only the erlang node, no other service…
The beam.smp was through the roof at 300% CPU and 97% RAM
The weird thing is that it got there in a second, I was looking at it when it happens.

- It has happened with 2000 connections, 4000 connections, and 10000 connections… 5 min after start, 5hours after start.

I really can't find a pattern here…and I'm becoming a little desperate.

Thank you for your help again.

Morgan.

Le 23 mai 2013 à 23:20, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a écrit :

> You system definitely leaking some resources :-/
>  - Check number of used FD(s) may be you exceeded limit there 
>  - What was overall system memory / cpu utilisation before crash?
>  - Check how many connections you got before crash, may be you can reproduce it at dev
> 
> - Dmitry
> 
> On May 24, 2013, at 12:13 AM, Morgan Segalis <msegalis@REDACTED> wrote:
> 
>> Ok, it finally got into the infinite loop…
>> 
>> And of course, the node on which I was running etop could not give me anymore since it got disconnected from the production node.
>> 
>> So back to square one… no way to investigate correctly so far :-/
>> 
>> Morgan.
>> 
>> Le 23 mai 2013 à 16:34, Morgan Segalis <msegalis@REDACTED> a écrit :
>> 
>>> Yeah that what I'm doing right now, but of course, when I'm monitoring it, it won't crash, only when I sleep !!
>>> 
>>> I get back to the Erlang list as soon as I have more informations about this.
>>> 
>>> Thank you all !
>>> 
>>> Morgan.
>>> 
>>> Le 23 mai 2013 à 16:30, Vance Shipley <vances@REDACTED> a écrit :
>>> 
>>>> Keep etop running and capture the output to a file (e.g. etop ... | tee stop.log). After it gets into trouble look back and see what was happening beforehand.
>>>> On May 23, 2013 6:16 PM, "Morgan Segalis" <msegalis@REDACTED> wrote:
>>>> So I should go back to R15B ?
>>>> 
>>>> erlang:memory() gives me 
>>>> 
>>>> [{total,1525779584},
>>>>  {processes,1272881427},
>>>>  {processes_used,1272789743},
>>>>  {system,252898157},
>>>>  {atom,372217},
>>>>  {atom_used,346096},
>>>>  {binary,148093608},
>>>>  {code,8274446},
>>>>  {ets,1546832}]
>>>> 
>>>> 
>>>> But keep in mind that right now, there is no infinite loop, or memory issue at this exact time…
>>>> It will be more interesting to have that when the VM is asking for 14GB of memory, but when it does, the console is unresponsive, so I can't get anything then.
>>>> 
>>>> Le 23 mai 2013 à 14:39, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a écrit :
>>>> 
>>>>> Right, you do not have many processes. Same time you goes out of memory…
>>>>> 
>>>>> Unfortunately, I had no time play around with R16B at production… 
>>>>> Could it be some issue with SSL, I re-call there was some complains in the list? 
>>>>> 
>>>>> I would use entop to spot the process that has either too much reductions, queue len or heap.
>>>>> Once you know they pid you can dig more info about them using erlang:process_info(…) and/or sys:get:status(…)
>>>>> 
>>>>> BTW, What erlang:memory() says on you production node?
>>>>> 
>>>>> - Dmitry
>>>>> 
>>>>> On May 23, 2013, at 3:25 PM, Morgan Segalis <msegalis@REDACTED> wrote:
>>>>> 
>>>>>> No, I was talking about the function I made to investigate which processes I have created, which gives me this output : 
>>>>>> 
>>>>>> Dict: {dict,16,16,16,8,80,48,
>>>>>>            {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>>>>>            {{[[{{connector_serv,init,1},[connector_suprc42,connector,<0.42.0>]}|548]],
>>>>>>              [],
>>>>>>              [[{{supervisor,connector_sup,1},[connector,<0.42.0>]}|3],
>>>>>>               [{{connector_serv,init,1},[connector_supssl,connector,<0.42.0>]}|1460],
>>>>>>               [{{supervisor,casserl_sup,1},[connector,<0.42.0>]}|1],
>>>>>>               [{{supervisor,pushiphone_sup,1},[connector,<0.42.0>]}|2],
>>>>>>               [{{pushiphone,init,1},['pushiphone-lite',connector,<0.42.0>]}|3],
>>>>>>               [{{supervisor,clientpool_sup,1},[connector,<0.42.0>]}|1]],
>>>>>>              [],
>>>>>>              [[{{clientpool,init,1},[clientpool_sup,connector,<0.42.0>]}|1],
>>>>>>               [undefined|4]],
>>>>>>              [],
>>>>>>              [[{{supervisor,connector,1},[<0.42.0>]}|1],
>>>>>>               [{{casserl_serv,init,1},[casserl_sup,connector,<0.42.0>]}|50]],
>>>>>>              [],[],[],
>>>>>>              [[{{connector_serv,init,1},[connector_suprc4,connector,<0.42.0>]}|472],
>>>>>>               [{{ssl_connection,init,1},
>>>>>>                 [ssl_connection_sup,ssl_sup,<0.51.0>]}|
>>>>>>                1366],
>>>>>>               [{unknown,unknown}|3]],
>>>>>>              [],[],
>>>>>>              [[{{pushiphone,init,1},['pushiphone-full',connector,<0.42.0>]}|3]],
>>>>>>              [],
>>>>>>              [[{{pg2,init,1},[kernel_safe_sup,kernel_sup,<0.10.0>]}|1]]}}}
>>>>>> ok
>>>>>> 
>>>>>> I'm very satisfied with supervisor, and I don't think to have the expertise tweaking it...
>>>>>> 
>>>>>> Le 23 mai 2013 à 14:19, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a écrit :
>>>>>> 
>>>>>>> 
>>>>>>> On May 23, 2013, at 1:04 PM, Morgan Segalis <msegalis@REDACTED> wrote:
>>>>>>> 
>>>>>>>> I have made a little function a while back, getting all processes and removing the processes inited at the beginning…
>>>>>>> 
>>>>>>> Could you please elaborate on that? Why you are not satisfied with supervisor?
>>>>>>> 
>>>>>>> - Dmitry 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> erlang-questions mailing list
>>>> erlang-questions@REDACTED
>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130523/6ab5a527/attachment.htm>


More information about the erlang-questions mailing list