[erlang-questions] R17 - Possible wedged scheduler

Sverker Eriksson sverker.eriksson@REDACTED
Mon Dec 12 15:00:55 CET 2016


First advice is: try a newer release.
1. If this is a bug then it might have been fixed in newer releases
as quite a lot have changed in the purging machinery, especially
in OTP-19.
2. The probablity to get help with troubleshooting from the Erlang/OTP team
increases with the OTP version number.


Having said that;
It looks like the code_server is doing a code purge operation.

If it is indeed hanging in code_server:cpc_recv/4 then a crash dump
would be helpful to debug.

erlang:halt("Crash it!").


/Sverker, Erlang/OTP




On 12/09/2016 05:35 PM, Matthew Evans wrote:
> Happened again, it appears that code_server is wedged:
>
>
> admin@REDACTED:~$ doErlangFun "erlang:process_info(whereis(code_server))."
>
> [{registered_name,code_server},
>
>   {current_function,{code_server,cpc_recv,4}},
>
>   {initial_call,{erlang,apply,2}},
>
>   {status,waiting},
>
>   {message_queue_len,23},
>
>   {messages,[{code_call,<6805.4097.0>,{ensure_loaded,switch_type_module}},
>
>              {code_call,<6805.4146.0>,{ensure_loaded,switch_type_module}},
>
>              {code_call,<6805.941.0>,{ensure_loaded,pc_port_autoneg}},
>
>              {code_call,<6805.541.0>,{ensure_loaded,plexxiStatistics_types}},
>
>              {code_call,<6805.520.0>,{ensure_loaded,switch_type_module}},
>
>              {code_call,<6805.5123.0>,{ensure_loaded,secondary_erlang_node}},
>
>              {code_call,<6805.5122.0>,{ensure_loaded,secondary_erlang_node}},
>
>              {code_call,<6805.5162.0>,{ensure_loaded,icmp}},
>
>              {code_call,<6805.5321.0>,
>
>                         {ensure_loaded,mac_entries_record_handler}},
>
>              {code_call,<6805.5483.0>,{ensure_loaded,icmp}},
>
>              {code_call,<6805.6647.0>,{ensure_loaded,icmp}},
>
>              {code_call,<6805.7232.0>,{ensure_loaded,icmp}},
>
>              {code_call,<6805.7274.0>,{ensure_loaded,icmp}},
>
>              {code_call,<6805.7304.0>,{ensure_loaded,icmp}},
>
>              {code_call,<6805.8889.0>,
>
>                         {ensure_loaded,mac_entries_record_handler}},
>
>              {code_call,<6805.8951.0>,
>
>                         {ensure_loaded,mac_entries_record_handler}},
>
>              {code_call,<6805.576.0>,
>
>                         {ensure_loaded,cross_connect_unicast_utils}},
>
>              {code_call,<6805.19300.12>,{ensure_loaded,shell}},
>
>              {code_call,<6805.20313.12>,{ensure_loaded,shell}},
>
>              {code_call,<6805.21339.12>,{ensure_loaded,dbg}},
>
>              {code_call,<6805.31109.13>,get_mode},
>
>              {code_call,<6805.1255.14>,get_mode},
>
>              {system,{<6805.2521.14>,#Ref<6805.0.23.35356>},get_status}]},
>
>   {links,[<6805.11.0>]},
>
>   {dictionary,[{any_native_code_loaded,false}]},
>
>   {trap_exit,true},
>
>   {error_handler,error_handler},
>
>   {priority,normal},
>
>   {group_leader,<6805.9.0>},
>
>   {total_heap_size,86071},
>
>   {heap_size,10958},
>
>   {stack_size,25},
>
>   {reductions,13172282},
>
>   {garbage_collection,[{min_bin_vheap_size,46422},
>
>                        {min_heap_size,233},
>
>                        {fullsweep_after,65535},
>
>                        {minor_gcs,71}]},
>
>   {suspending,[]}]
>
> admin@REDACTED:~$
>
>
>
> ________________________________
> From:erlang-questions-bounces@REDACTED  <erlang-questions-bounces@REDACTED>  on behalf of Matthew Evans<mattevans123@REDACTED>
> Sent: Friday, December 9, 2016 9:56 AM
> To: Erlang/OTP discussions
> Subject: [erlang-questions] R17 - Possible wedged scheduler
>
>
> Hi,
>
>
> We just hit a situation where it appeared that 1 scheduler was wedged. Some parts of our application were working, but others appeared to be stuck. I could connect via a cnode application and an escript, but I couldn't connect via the Erlang shell. We have an escript that does rpc calls, some worked, others (e.g. anything to the code server or tracing failed) failed.
>
>
> CPU load was minimal at the time, and heart didn't complain. We only have a single NIF, but this is not called on this hardware variant. We do use CNODE to talk to C applications.
>
>
> We are running R17, Intel quad core CPU on Debian.
>
>
> This is the first time this has been seen, so the questions are:
>
>
> 1. Has anyone seen this before?
>
> 2. What can we do if we hit this condition in the future to debug?
>
> 3. Since heart doesn't detect this can anyone think of any alternative mechanisms?
>
>
> Thanks
>
>
> Matt
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161212/c18c9742/attachment.htm>


More information about the erlang-questions mailing list