[erlang-questions] Investigate an infinite loop on production servers

Morgan Segalis msegalis@REDACTED
Fri May 24 10:35:48 CEST 2013


Thanks ! 

I'll try that…

will keep you posted.

Le 24 mai 2013 à 10:30, masked.prize@REDACTED a écrit :

> You can run a little function that writes process info to files every N seconds. Like this:
> F = fun(F2, T) -> Seconds=calendar:datetime_to_gregorian_seconds(calendar:now_to_local_time(now())), Fname=lists:flatten(io_lib:format("/tmp/f-~p", [Seconds])), [begin Info=process_info(Pid), Data=io_lib:format("~p:~n~p~n", [Pid, Info]), file:write_file(Fname, Data, [append]) end || Pid <- processes()], timer:sleep(T), F2(F2, T) end.
> 
> run it from console with F(F, 5000) and get a bunch of files in /tmp that probably can provide you something useful
> 
> On Friday, May 24, 2013 1:13:29 AM UTC+4, Morgan Segalis wrote:
> Ok, it finally got into the infinite loop…
> 
> And of course, the node on which I was running etop could not give me anymore since it got disconnected from the production node.
> 
> So back to square one… no way to investigate correctly so far :-/
> 
> Morgan.
> 
> Le 23 mai 2013 à 16:34, Morgan Segalis <mseg...@REDACTED> a écrit :
> 
>> Yeah that what I'm doing right now, but of course, when I'm monitoring it, it won't crash, only when I sleep !!
>> 
>> I get back to the Erlang list as soon as I have more informations about this.
>> 
>> Thank you all !
>> 
>> Morgan.
>> 
>> Le 23 mai 2013 à 16:30, Vance Shipley <van...@REDACTED> a écrit :
>> 
>>> Keep etop running and capture the output to a file (e.g. etop ... | tee stop.log). After it gets into trouble look back and see what was happening beforehand.
>>> On May 23, 2013 6:16 PM, "Morgan Segalis" <mseg...@REDACTED> wrote:
>>> So I should go back to R15B ?
>>> 
>>> erlang:memory() gives me 
>>> 
>>> [{total,1525779584},
>>>  {processes,1272881427},
>>>  {processes_used,1272789743},
>>>  {system,252898157},
>>>  {atom,372217},
>>>  {atom_used,346096},
>>>  {binary,148093608},
>>>  {code,8274446},
>>>  {ets,1546832}]
>>> 
>>> 
>>> But keep in mind that right now, there is no infinite loop, or memory issue at this exact time…
>>> It will be more interesting to have that when the VM is asking for 14GB of memory, but when it does, the console is unresponsive, so I can't get anything then.
>>> 
>>> Le 23 mai 2013 à 14:39, Dmitry Kolesnikov <dmkole...@REDACTED> a écrit :
>>> 
>>>> Right, you do not have many processes. Same time you goes out of memory…
>>>> 
>>>> Unfortunately, I had no time play around with R16B at production… 
>>>> Could it be some issue with SSL, I re-call there was some complains in the list? 
>>>> 
>>>> I would use entop to spot the process that has either too much reductions, queue len or heap.
>>>> Once you know they pid you can dig more info about them using erlang:process_info(…) and/or sys:get:status(…)
>>>> 
>>>> BTW, What erlang:memory() says on you production node?
>>>> 
>>>> - Dmitry
>>>> 
>>>> On May 23, 2013, at 3:25 PM, Morgan Segalis <mseg...@REDACTED> wrote:
>>>> 
>>>>> No, I was talking about the function I made to investigate which processes I have created, which gives me this output : 
>>>>> 
>>>>> Dict: {dict,16,16,16,8,80,48,
>>>>>            {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>>>>            {{[[{{connector_serv,init,1},[connector_suprc42,connector,<0.42.0>]}|548]],
>>>>>              [],
>>>>>              [[{{supervisor,connector_sup,1},[connector,<0.42.0>]}|3],
>>>>>               [{{connector_serv,init,1},[connector_supssl,connector,<0.42.0>]}|1460],
>>>>>               [{{supervisor,casserl_sup,1},[connector,<0.42.0>]}|1],
>>>>>               [{{supervisor,pushiphone_sup,1},[connector,<0.42.0>]}|2],
>>>>>               [{{pushiphone,init,1},['pushiphone-lite',connector,<0.42.0>]}|3],
>>>>>               [{{supervisor,clientpool_sup,1},[connector,<0.42.0>]}|1]],
>>>>>              [],
>>>>>              [[{{clientpool,init,1},[clientpool_sup,connector,<0.42.0>]}|1],
>>>>>               [undefined|4]],
>>>>>              [],
>>>>>              [[{{supervisor,connector,1},[<0.42.0>]}|1],
>>>>>               [{{casserl_serv,init,1},[casserl_sup,connector,<0.42.0>]}|50]],
>>>>>              [],[],[],
>>>>>              [[{{connector_serv,init,1},[connector_suprc4,connector,<0.42.0>]}|472],
>>>>>               [{{ssl_connection,init,1},
>>>>>                 [ssl_connection_sup,ssl_sup,<0.51.0>]}|
>>>>>                1366],
>>>>>               [{unknown,unknown}|3]],
>>>>>              [],[],
>>>>>              [[{{pushiphone,init,1},['pushiphone-full',connector,<0.42.0>]}|3]],
>>>>>              [],
>>>>>              [[{{pg2,init,1},[kernel_safe_sup,kernel_sup,<0.10.0>]}|1]]}}}
>>>>> ok
>>>>> 
>>>>> I'm very satisfied with supervisor, and I don't think to have the expertise tweaking it...
>>>>> 
>>>>> Le 23 mai 2013 à 14:19, Dmitry Kolesnikov <dmkole...@REDACTED> a écrit :
>>>>> 
>>>>>> 
>>>>>> On May 23, 2013, at 1:04 PM, Morgan Segalis <mseg...@REDACTED> wrote:
>>>>>> 
>>>>>>> I have made a little function a while back, getting all processes and removing the processes inited at the beginning…
>>>>>> 
>>>>>> Could you please elaborate on that? Why you are not satisfied with supervisor?
>>>>>> 
>>>>>> - Dmitry 
>>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-q...@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130524/e79a2c58/attachment.htm>


More information about the erlang-questions mailing list