process waits forever spawn_opt/5

Rickard Green rickard@REDACTED
Thu Jun 17 22:41:23 CEST 2021


On Thu, Jun 17, 2021 at 1:45 PM Vyacheslav Levytskyy <v.levytskyy@REDACTED>
wrote:

> Thank you for details, I think it explains the most part of the situation.
> I checked messages indeed, they were all specific to my application - no
> "{spawn_reply, Ref, ok|error, Pid|Error}" for sure, just usual '$gen_cast'
> and system. Judging from messages, the caller was blocked for about 4 hours
> when I noticed that. The node is ordinary Erlang node, nothing special
> except for the complicated environment.
>
Is it an OTP 24 node as well?

> The environment is Kubernetes with istio used for networking. It's
> possible that one of nodes of the cluster was restarted abruptly, and may
> be it was related to version upgrade of istio networking, so we have either
> restart of a node or a possible glitch of networking to break connection,
> and also a generally interesting networking implementation. One surprising
> issue, however, is that there were no timeouts and spawn_opt/5 just stuck
> in that state. Could it be related to the environment? If yes, and the
> caller may be blocked in unfortunate circumstances in K8s/istio env, would
> you suggest a way to prevent such situations?
>
If the connection to the other node is lost, the local runtime system will
send a  {spawn_reply, Ref, error, noconnection} message to the process
blocked in spawn_opt() which will cause spawn_opt() to return. The local
runtime system detects that the connection is lost either by the tcp socket
being closed or by the local runtime system detecting that there has been
no incoming traffic during net_ticktime seconds <
https://erlang.org/doc/man/kernel_app.html#net_ticktime> (which defaults to
60 seconds). I guess that you haven't increased net_ticktime to more than 4
hours which indicates that there is a bug somewhere.

Please open a bug issue at <https://github.com/erlang/otp/issues> where we
can continue this.

Regards,
Rickard

> Thank you,
> Vyacheslav
> On 16.06.2021 17:03, Rickard Green wrote:
>
>
>
> On Wed, Jun 16, 2021 at 9:15 AM Vyacheslav Levytskyy <
> v.levytskyy@REDACTED> wrote:
>
>
>> The function doesn't interact with the gen_server that calls spawn/4,
>> although I'd expect spawn/4 to run a process and return immediately
>> anyway, am I wrong?
>>
>>
> All spawn operations except for spawn_request() (introduced in OTP 23) are
> synchronous and block until the new process has been created and the caller
> of the BIF has received the process identifier of the newly created process
> or an error is detected. In case the connection between the nodes stalls
> the caller will be blocked until the network ticker takes down the
> connection (default 60 seconds).
>
>
>> >>
>> >> I'm surprised to see my gen_server process hanging forever when
>> >> executing spawn/4 call. Process info shows spawn_opt/5 as a current
>> >> function and status is waiting:
>> >>
>> >>   > process_info(P).
>> >> [{current_function,{erlang,spawn_opt,5}},
>> >>    {status,waiting},
>> >>    {message_queue_len,13},
>> >>    {trap_exit,false},
>> >>    {priority,normal},
>> >>    ...]
>> >>
>>
>
> Would have been interessting to know what process_info(P, messages) had
> returned. In the distributed case spawn_opt() is waiting for a message on
> the format: {spawn_reply, Ref, ok|error, Pid|Error}
>
> What type of node is the node that you are trying to spawn the new process
> on? Ordinary Erlang node, C-node, ...? OTP release of that node?
>
> Regards,
> Rickard
> --
> Rickard Green, Erlang/OTP, Ericsson AB
>
>

-- 
Rickard Green, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210617/de028313/attachment.htm>


More information about the erlang-questions mailing list