<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 17, 2021 at 1:45 PM Vyacheslav Levytskyy <<a href="mailto:v.levytskyy@yahoo.com">v.levytskyy@yahoo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Thank you for details, I think it explains the most part of the
situation. I checked messages indeed, they were all specific to my
application - no "{spawn_reply, Ref, ok|error, Pid|Error}" for
sure, just usual '$gen_cast' and system. Judging from messages,
the caller was blocked for about 4 hours when I noticed that. The
node is ordinary Erlang node, nothing special except for the
complicated environment. </p></div></blockquote><div>Is it an OTP 24 node as well? <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<p>The environment is Kubernetes with istio used for networking.
It's possible that one of nodes of the cluster was restarted
abruptly, and may be it was related to version upgrade of istio
networking, so we have either restart of a node or a possible
glitch of networking to break connection, and also a generally
interesting networking implementation. One surprising issue,
however, is that there were no timeouts and spawn_opt/5 just stuck
in that state. Could it be related to the environment? If yes, and
the caller may be blocked in unfortunate circumstances in
K8s/istio env, would you suggest a way to prevent such situations?</p></div></blockquote><div>If the connection to the other node is lost, the local runtime system will send a {spawn_reply, Ref, error, noconnection} message to the process blocked in spawn_opt() which will cause spawn_opt() to return. The local runtime system detects that the connection is lost either by the tcp socket being closed or by the local runtime system detecting that there has been no incoming traffic during net_ticktime seconds <<a href="https://erlang.org/doc/man/kernel_app.html#net_ticktime">https://erlang.org/doc/man/kernel_app.html#net_ticktime</a>> (which defaults to 60 seconds). I guess that you haven't increased net_ticktime to more than 4 hours which indicates that there is a bug somewhere.</div><div><br></div><div>Please open a bug issue at <<a href="https://github.com/erlang/otp/issues">https://github.com/erlang/otp/issues</a>> where we can continue this.</div><div><br></div><div>Regards,</div><div>Rickard<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<p>Thank you,<br>
Vyacheslav<br>
</p>
<div>On 16.06.2021 17:03, Rickard Green
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Jun 16, 2021 at 9:15
AM Vyacheslav Levytskyy <<a href="mailto:v.levytskyy@yahoo.com" target="_blank">v.levytskyy@yahoo.com</a>>
wrote:</div>
<div dir="ltr" class="gmail_attr"><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
The function doesn't interact with the gen_server that calls
spawn/4, <br>
although I'd expect spawn/4 to run a process and return
immediately <br>
anyway, am I wrong?<br>
<br>
</blockquote>
<div><br>
</div>
<div>All spawn operations except for spawn_request()
(introduced in OTP 23) are synchronous and block until the
new process has been created and the caller of the BIF has
received the process identifier of the newly created process
or an error is detected. In case the connection between the
nodes stalls the caller will be blocked until the network
ticker takes down the connection (default 60 seconds).<br>
</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
>><br>
>> I'm surprised to see my gen_server process hanging
forever when<br>
>> executing spawn/4 call. Process info shows
spawn_opt/5 as a current<br>
>> function and status is waiting:<br>
>><br>
>> > process_info(P).<br>
>> [{current_function,{erlang,spawn_opt,5}},<br>
>> {status,waiting},<br>
>> {message_queue_len,13},<br>
>> {trap_exit,false},<br>
>> {priority,normal},<br>
>> ...]<br>
>><br>
</blockquote>
<div><br>
</div>
<div>Would have been interessting to know what process_info(P,
messages) had returned. In the distributed case spawn_opt()
is waiting for a message on the format: {spawn_reply, Ref,
ok|error, Pid|Error}</div>
</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">What type of node is the node that you
are trying to spawn the new process on? Ordinary Erlang node,
C-node, ...? OTP release of that node?<br>
</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">Regards,</div>
<div class="gmail_quote">Rickard<br>
</div>
-- <br>
<div dir="ltr">Rickard Green,
Erlang/OTP, Ericsson AB</div>
</div>
</blockquote>
</div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature">Rickard Green, Erlang/OTP, Ericsson AB</div></div>