process waits forever spawn_opt/5

Vyacheslav Levytskyy v.levytskyy@REDACTED
Thu Jun 17 13:45:49 CEST 2021


Thank you for details, I think it explains the most part of the 
situation. I checked messages indeed, they were all specific to my 
application - no "{spawn_reply, Ref, ok|error, Pid|Error}" for sure, 
just usual '$gen_cast' and system. Judging from messages, the caller was 
blocked for about 4 hours when I noticed that. The node is ordinary 
Erlang node, nothing special except for the complicated environment.

The environment is Kubernetes with istio used for networking. It's 
possible that one of nodes of the cluster was restarted abruptly, and 
may be it was related to version upgrade of istio networking, so we have 
either restart of a node or a possible glitch of networking to break 
connection, and also a generally interesting networking implementation. 
One surprising issue, however, is that there were no timeouts and 
spawn_opt/5 just stuck in that state. Could it be related to the 
environment? If yes, and the caller may be blocked in unfortunate 
circumstances in K8s/istio env, would you suggest a way to prevent such 
situations?

Thank you,
Vyacheslav

On 16.06.2021 17:03, Rickard Green wrote:
>
>
> On Wed, Jun 16, 2021 at 9:15 AM Vyacheslav Levytskyy 
> <v.levytskyy@REDACTED <mailto:v.levytskyy@REDACTED>> wrote:
>
>
>     The function doesn't interact with the gen_server that calls spawn/4,
>     although I'd expect spawn/4 to run a process and return immediately
>     anyway, am I wrong?
>
>
> All spawn operations except for spawn_request() (introduced in OTP 23) 
> are synchronous and block until the new process has been created and 
> the caller of the BIF has received the process identifier of the newly 
> created process or an error is detected. In case the connection 
> between the nodes stalls the caller will be blocked until the network 
> ticker takes down the connection (default 60 seconds).
>
>     >>
>     >> I'm surprised to see my gen_server process hanging forever when
>     >> executing spawn/4 call. Process info shows spawn_opt/5 as a current
>     >> function and status is waiting:
>     >>
>     >>   > process_info(P).
>     >> [{current_function,{erlang,spawn_opt,5}},
>     >>    {status,waiting},
>     >>    {message_queue_len,13},
>     >>    {trap_exit,false},
>     >>    {priority,normal},
>     >>    ...]
>     >>
>
>
> Would have been interessting to know what process_info(P, messages) 
> had returned. In the distributed case spawn_opt() is waiting for a 
> message on the format: {spawn_reply, Ref, ok|error, Pid|Error}
>
> What type of node is the node that you are trying to spawn the new 
> process on? Ordinary Erlang node, C-node, ...? OTP release of that node?
>
> Regards,
> Rickard
> -- 
> Rickard Green, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210617/93ecc452/attachment.htm>


More information about the erlang-questions mailing list