[erlang-questions] R17 - Possible wedged scheduler

Matthew Evans mattevans123@REDACTED
Fri Dec 9 17:35:21 CET 2016


Happened again, it appears that code_server is wedged:


admin@REDACTED:~$ doErlangFun "erlang:process_info(whereis(code_server))."

[{registered_name,code_server},

 {current_function,{code_server,cpc_recv,4}},

 {initial_call,{erlang,apply,2}},

 {status,waiting},

 {message_queue_len,23},

 {messages,[{code_call,<6805.4097.0>,{ensure_loaded,switch_type_module}},

            {code_call,<6805.4146.0>,{ensure_loaded,switch_type_module}},

            {code_call,<6805.941.0>,{ensure_loaded,pc_port_autoneg}},

            {code_call,<6805.541.0>,{ensure_loaded,plexxiStatistics_types}},

            {code_call,<6805.520.0>,{ensure_loaded,switch_type_module}},

            {code_call,<6805.5123.0>,{ensure_loaded,secondary_erlang_node}},

            {code_call,<6805.5122.0>,{ensure_loaded,secondary_erlang_node}},

            {code_call,<6805.5162.0>,{ensure_loaded,icmp}},

            {code_call,<6805.5321.0>,

                       {ensure_loaded,mac_entries_record_handler}},

            {code_call,<6805.5483.0>,{ensure_loaded,icmp}},

            {code_call,<6805.6647.0>,{ensure_loaded,icmp}},

            {code_call,<6805.7232.0>,{ensure_loaded,icmp}},

            {code_call,<6805.7274.0>,{ensure_loaded,icmp}},

            {code_call,<6805.7304.0>,{ensure_loaded,icmp}},

            {code_call,<6805.8889.0>,

                       {ensure_loaded,mac_entries_record_handler}},

            {code_call,<6805.8951.0>,

                       {ensure_loaded,mac_entries_record_handler}},

            {code_call,<6805.576.0>,

                       {ensure_loaded,cross_connect_unicast_utils}},

            {code_call,<6805.19300.12>,{ensure_loaded,shell}},

            {code_call,<6805.20313.12>,{ensure_loaded,shell}},

            {code_call,<6805.21339.12>,{ensure_loaded,dbg}},

            {code_call,<6805.31109.13>,get_mode},

            {code_call,<6805.1255.14>,get_mode},

            {system,{<6805.2521.14>,#Ref<6805.0.23.35356>},get_status}]},

 {links,[<6805.11.0>]},

 {dictionary,[{any_native_code_loaded,false}]},

 {trap_exit,true},

 {error_handler,error_handler},

 {priority,normal},

 {group_leader,<6805.9.0>},

 {total_heap_size,86071},

 {heap_size,10958},

 {stack_size,25},

 {reductions,13172282},

 {garbage_collection,[{min_bin_vheap_size,46422},

                      {min_heap_size,233},

                      {fullsweep_after,65535},

                      {minor_gcs,71}]},

 {suspending,[]}]

admin@REDACTED:~$



________________________________
From: erlang-questions-bounces@REDACTED <erlang-questions-bounces@REDACTED> on behalf of Matthew Evans <mattevans123@REDACTED>
Sent: Friday, December 9, 2016 9:56 AM
To: Erlang/OTP discussions
Subject: [erlang-questions] R17 - Possible wedged scheduler


Hi,


We just hit a situation where it appeared that 1 scheduler was wedged. Some parts of our application were working, but others appeared to be stuck. I could connect via a cnode application and an escript, but I couldn't connect via the Erlang shell. We have an escript that does rpc calls, some worked, others (e.g. anything to the code server or tracing failed) failed.


CPU load was minimal at the time, and heart didn't complain. We only have a single NIF, but this is not called on this hardware variant. We do use CNODE to talk to C applications.


We are running R17, Intel quad core CPU on Debian.


This is the first time this has been seen, so the questions are:


1. Has anyone seen this before?

2. What can we do if we hit this condition in the future to debug?

3. Since heart doesn't detect this can anyone think of any alternative mechanisms?


Thanks


Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161209/d7a66d98/attachment.htm>


More information about the erlang-questions mailing list