[erlang-questions] Investigate an infinite loop on production servers
Morgan Segalis
msegalis@REDACTED
Fri May 24 10:34:37 CEST 2013
Thank you, I'll look into it.
Heres what application:which_application() gives me :
[{emysql,"Emysql - Erlang MySQL driver","0.2"},
{ssl,"Erlang/OTP SSL application","5.2.1"},
{public_key,"Public key infrastructure","0.18"},
{crypto,"CRYPTO version 2","2.3"},
{stdlib,"ERTS CXC 138 10","1.19.1"},
{kernel,"ERTS CXC 138 10","2.16.1"}]
nothing fancy has you can see...
Le 24 mai 2013 à 10:31, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a écrit :
> Hello,
>
> I am not aware of a single flag to limit the memory like in Java.
> You can try to configure memory allocation
> http://www.erlang.org/doc/man/erts_alloc.html
>
> One of the freeze reason might be a huge crash_dump.
> See the flags at bottom of page how to tune its behaviour
> http://www.erlang.org/doc/man/erl.html
>
> If you switch off a swap it helps to observe OOM.
>
> Would you share to the list app you running applications?
> application:which_applications()
>
>
> - Dmitry
>
>
> On May 24, 2013, at 11:13 AM, Morgan Segalis <msegalis@REDACTED> wrote:
>
>> The problem is that the VM freezes completely, it does not generate a crash dump
>>
>> Is there a way to limit the memory that a VM can allocate, so the server is not overwhelmed in order to create a crash dump ?
>>
>> Le 24 mai 2013 à 02:00, Vance Shipley <vances@REDACTED> a écrit :
>>
>>> Have you analyzed the crash dump file with the crash dump viewer?
>>> On May 24, 2013 3:00 AM, "Morgan Segalis" <msegalis@REDACTED> wrote:
>>> Yeah you got that right ! leaking at a huge rate at some point !
>>>
>>> - The number of Fd - I don't get close to the max
>>> # cat /proc/sys/fs/file-nr
>>> 3264 0 6455368
>>>
>>> - On the production server there is only the erlang node, no other service…
>>> The beam.smp was through the roof at 300% CPU and 97% RAM
>>> The weird thing is that it got there in a second, I was looking at it when it happens.
>>>
>>> - It has happened with 2000 connections, 4000 connections, and 10000 connections… 5 min after start, 5hours after start.
>>>
>>> I really can't find a pattern here…and I'm becoming a little desperate.
>>>
>>> Thank you for your help again.
>>>
>>> Morgan.
>>>
>>> Le 23 mai 2013 à 23:20, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a écrit :
>>>
>>>> You system definitely leaking some resources :-/
>>>> - Check number of used FD(s) may be you exceeded limit there
>>>> - What was overall system memory / cpu utilisation before crash?
>>>> - Check how many connections you got before crash, may be you can reproduce it at dev
>>>>
>>>> - Dmitry
>>>>
>>>> On May 24, 2013, at 12:13 AM, Morgan Segalis <msegalis@REDACTED> wrote:
>>>>
>>>>> Ok, it finally got into the infinite loop…
>>>>>
>>>>> And of course, the node on which I was running etop could not give me anymore since it got disconnected from the production node.
>>>>>
>>>>> So back to square one… no way to investigate correctly so far :-/
>>>>>
>>>>> Morgan.
>>>>>
>>>>> Le 23 mai 2013 à 16:34, Morgan Segalis <msegalis@REDACTED> a écrit :
>>>>>
>>>>>> Yeah that what I'm doing right now, but of course, when I'm monitoring it, it won't crash, only when I sleep !!
>>>>>>
>>>>>> I get back to the Erlang list as soon as I have more informations about this.
>>>>>>
>>>>>> Thank you all !
>>>>>>
>>>>>> Morgan.
>>>>>>
>>>>>> Le 23 mai 2013 à 16:30, Vance Shipley <vances@REDACTED> a écrit :
>>>>>>
>>>>>>> Keep etop running and capture the output to a file (e.g. etop ... | tee stop.log). After it gets into trouble look back and see what was happening beforehand.
>>>>>>> On May 23, 2013 6:16 PM, "Morgan Segalis" <msegalis@REDACTED> wrote:
>>>>>>> So I should go back to R15B ?
>>>>>>>
>>>>>>> erlang:memory() gives me
>>>>>>>
>>>>>>> [{total,1525779584},
>>>>>>> {processes,1272881427},
>>>>>>> {processes_used,1272789743},
>>>>>>> {system,252898157},
>>>>>>> {atom,372217},
>>>>>>> {atom_used,346096},
>>>>>>> {binary,148093608},
>>>>>>> {code,8274446},
>>>>>>> {ets,1546832}]
>>>>>>>
>>>>>>>
>>>>>>> But keep in mind that right now, there is no infinite loop, or memory issue at this exact time…
>>>>>>> It will be more interesting to have that when the VM is asking for 14GB of memory, but when it does, the console is unresponsive, so I can't get anything then.
>>>>>>>
>>>>>>> Le 23 mai 2013 à 14:39, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a écrit :
>>>>>>>
>>>>>>>> Right, you do not have many processes. Same time you goes out of memory…
>>>>>>>>
>>>>>>>> Unfortunately, I had no time play around with R16B at production…
>>>>>>>> Could it be some issue with SSL, I re-call there was some complains in the list?
>>>>>>>>
>>>>>>>> I would use entop to spot the process that has either too much reductions, queue len or heap.
>>>>>>>> Once you know they pid you can dig more info about them using erlang:process_info(…) and/or sys:get:status(…)
>>>>>>>>
>>>>>>>> BTW, What erlang:memory() says on you production node?
>>>>>>>>
>>>>>>>> - Dmitry
>>>>>>>>
>>>>>>>> On May 23, 2013, at 3:25 PM, Morgan Segalis <msegalis@REDACTED> wrote:
>>>>>>>>
>>>>>>>>> No, I was talking about the function I made to investigate which processes I have created, which gives me this output :
>>>>>>>>>
>>>>>>>>> Dict: {dict,16,16,16,8,80,48,
>>>>>>>>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>>>>>>>> {{[[{{connector_serv,init,1},[connector_suprc42,connector,<0.42.0>]}|548]],
>>>>>>>>> [],
>>>>>>>>> [[{{supervisor,connector_sup,1},[connector,<0.42.0>]}|3],
>>>>>>>>> [{{connector_serv,init,1},[connector_supssl,connector,<0.42.0>]}|1460],
>>>>>>>>> [{{supervisor,casserl_sup,1},[connector,<0.42.0>]}|1],
>>>>>>>>> [{{supervisor,pushiphone_sup,1},[connector,<0.42.0>]}|2],
>>>>>>>>> [{{pushiphone,init,1},['pushiphone-lite',connector,<0.42.0>]}|3],
>>>>>>>>> [{{supervisor,clientpool_sup,1},[connector,<0.42.0>]}|1]],
>>>>>>>>> [],
>>>>>>>>> [[{{clientpool,init,1},[clientpool_sup,connector,<0.42.0>]}|1],
>>>>>>>>> [undefined|4]],
>>>>>>>>> [],
>>>>>>>>> [[{{supervisor,connector,1},[<0.42.0>]}|1],
>>>>>>>>> [{{casserl_serv,init,1},[casserl_sup,connector,<0.42.0>]}|50]],
>>>>>>>>> [],[],[],
>>>>>>>>> [[{{connector_serv,init,1},[connector_suprc4,connector,<0.42.0>]}|472],
>>>>>>>>> [{{ssl_connection,init,1},
>>>>>>>>> [ssl_connection_sup,ssl_sup,<0.51.0>]}|
>>>>>>>>> 1366],
>>>>>>>>> [{unknown,unknown}|3]],
>>>>>>>>> [],[],
>>>>>>>>> [[{{pushiphone,init,1},['pushiphone-full',connector,<0.42.0>]}|3]],
>>>>>>>>> [],
>>>>>>>>> [[{{pg2,init,1},[kernel_safe_sup,kernel_sup,<0.10.0>]}|1]]}}}
>>>>>>>>> ok
>>>>>>>>>
>>>>>>>>> I'm very satisfied with supervisor, and I don't think to have the expertise tweaking it...
>>>>>>>>>
>>>>>>>>> Le 23 mai 2013 à 14:19, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a écrit :
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On May 23, 2013, at 1:04 PM, Morgan Segalis <msegalis@REDACTED> wrote:
>>>>>>>>>>
>>>>>>>>>>> I have made a little function a while back, getting all processes and removing the processes inited at the beginning…
>>>>>>>>>>
>>>>>>>>>> Could you please elaborate on that? Why you are not satisfied with supervisor?
>>>>>>>>>>
>>>>>>>>>> - Dmitry
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> erlang-questions mailing list
>>>>>>> erlang-questions@REDACTED
>>>>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130524/57721f84/attachment.htm>
More information about the erlang-questions
mailing list