[erlang-questions] Investigate an infinite loop on production servers

Morgan Segalis msegalis@REDACTED
Thu May 23 12:04:11 CEST 2013


For more information here's what my erlang node is doing : 

It is an instant messaging server, each client connected is a process spawn automatically by a supervisor…

Every process spawned is monitored and started by a supervisor…

I have made a little function a while back, getting all processes and removing the processes inited at the beginning…

Here's what it gives me when everything works fine : 

Dict: {dict,16,16,16,8,80,48,
            {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
            {{[[{{connector_serv,init,1},[connector_suprc42,connector,<0.42.0>]}|548]],
              [],
              [[{{supervisor,connector_sup,1},[connector,<0.42.0>]}|3],
               [{{connector_serv,init,1},[connector_supssl,connector,<0.42.0>]}|1460],
               [{{supervisor,casserl_sup,1},[connector,<0.42.0>]}|1],
               [{{supervisor,pushiphone_sup,1},[connector,<0.42.0>]}|2],
               [{{pushiphone,init,1},['pushiphone-lite',connector,<0.42.0>]}|3],
               [{{supervisor,clientpool_sup,1},[connector,<0.42.0>]}|1]],
              [],
              [[{{clientpool,init,1},[clientpool_sup,connector,<0.42.0>]}|1],
               [undefined|4]],
              [],
              [[{{supervisor,connector,1},[<0.42.0>]}|1],
               [{{casserl_serv,init,1},[casserl_sup,connector,<0.42.0>]}|50]],
              [],[],[],
              [[{{connector_serv,init,1},[connector_suprc4,connector,<0.42.0>]}|472],
               [{{ssl_connection,init,1},
                 [ssl_connection_sup,ssl_sup,<0.51.0>]}|
                1366],
               [{unknown,unknown}|3]],
              [],[],
              [[{{pushiphone,init,1},['pushiphone-full',connector,<0.42.0>]}|3]],
              [],
              [[{{pg2,init,1},[kernel_safe_sup,kernel_sup,<0.10.0>]}|1]]}}}
ok



Le 23 mai 2013 à 11:50, Dmitry Kolesnikov <dmkolesnikov@REDACTED> a écrit :

> which means that you are using proc_lib heavily...
> Are those top process with reductions, message queue size or heap? 
> 
> Try to connect into node and gather more info about those processes using
> erlang:process_info(…) or sys:get_status(…)
> 
> - Dmitry
> 
> On May 23, 2013, at 12:35 PM, Morgan Segalis <msegalis@REDACTED> wrote:
> 
>> Nevermind I got it…
>> 
>> However I do not get a lot of information…
>> 
>> most of process is proc_lib:ini_p/5
>> 
>> Le 23 mai 2013 à 11:23, Morgan Segalis <msegalis@REDACTED> a écrit :
>> 
>>> Apparently I'm monitoring my own node…
>>> 
>>> Does someone know how to monitor a external cluster node with etop ?
>>> 
>>> Le 23 mai 2013 à 11:13, Morgan Segalis <msegalis@REDACTED> a écrit :
>>> 
>>>> I have launch the etop on my computer monitoring the production server… hoping that I will see something wrong !
>>>> 
>>>> Thank you for your help so far (to All).
>>>> 
>>>> I'll come back to you as soon as I have more information with etop.
>>>> 
>>>> Morgan.
>>>> 
>>>> Le 23 mai 2013 à 07:38, Vance Shipley <vances@REDACTED> a écrit :
>>>> 
>>>>> On Thu, May 23, 2013 at 04:00:07AM +0200, Morgan Segalis wrote:
>>>>> }  I'm having a bit of an issue with my production servers.
>>>>> 
>>>>> You will find that etop is your friend:
>>>>> 
>>>>> 	http://www.erlang.org/doc/apps/observer/etop_ug.html
>>>>> 
>>>>> Run etop from the command line and sort on the column you're
>>>>> interested in.  To watch memory usage:
>>>>> 
>>>>> 	etop -node tiger@REDACTED -sort memory
>>>>> 
>>>>> This will list the processes by memory size in decreasing order.
>>>>> This shows you the memory hogs.  Watch it as it starts to get 
>>>>> into trouble and you should see where the memory is getting used.
>>>>> 
>>>>> As Bob points out the most common problem is that a process's 
>>>>> inbox will start to fill up.  Once this starts happening it's
>>>>> the beginning of the end.  Another process may start eating up
>>>>> memory and the node may crash because it has requested more than
>>>>> is available bt the root cause was that one process not having
>>>>> time to service the messages at the rate they are received.
>>>>> 
>>>>> To watch for message queue lengths:
>>>>> 
>>>>> 	etop -node tiger@REDACTED -sort msg_q
>>>>> 
>>>>> The above will list the processes in decreasing order of inbox
>>>>> size.  They should all be zero, and sometimes one, normally.  If
>>>>> you have a problem you'll see one process stay at the top and it's
>>>>> message queue length will start to grow over time.
>>>>> 
>>>>> -- 
>>>>> 	-Vance
>>>> 
>>> 
>> 
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
> 




More information about the erlang-questions mailing list