** exception exit: noconnection

Wed Dec 4 12:17:39 CET 2019

On 4. Dec 2019, at 11:31, Lukas Larsson <lukas@REDACTED> wrote:
> 
> 
> 
> On Wed, Dec 4, 2019 at 10:59 AM Adam Lindberg <hello@REDACTED> wrote:
> Ah, thanks! That’s good to know. :-) 
> 
> Maybe I missed it but I can’t find this documented anywhere (in e.g. erlang:link/1 or erlang:monitor/2). The only place I can find it referenced is in an example in the Getting Started User’s Guide: http://erlang.org/doc/getting_started/robustness.html
> 
> 
> It is mentioned under the Info section in the erlang:monitor/2 documentation.

Interesting, didn’t show up very early in my search results. Thanks for the pointers.

>  
> Perhaps it should be documented more prominently?
> 
> Yes it should, just as noproc is.

That would be great. I’ll prepare a PR.

>  
> Next question is I need to clarify is: can gen_server processes never receive exit messages as normal info messages? If I enable trap_exit I only receive a call to terminate with the noconnection error eventually...
> 
> I'm not sure I understand what you mean.
> 
> When trapping exits, a gen_server will either get the exit in the terminate or handle_info callback. Which one depends on which process sends the exit signal. If it is the "parent" process, i.e. the process that started the gen_server, then the terminate callback will be called. If it some other process it is the handle_info callback that is called. The assumption here is that if the parent exits for any reason, you want to terminate your gen_server, but if a child or peer exits, then you want to handle that and possibly continue running. 

Yeah, that it is coming from the parent is most likely my case. I think I painted myself into a very obscure corner here. I start some linked, unsupervised gen_server processes from a shell function, then run the tests with the the help of those. Once the test process on the system under test dies with 'noconnection’ it arrives at the shell process, which is the parent to the test processes.

One thing that I think could be improved is the error printout in the shell:

    (test@REDACTED)1> my_test:start().
    Running...
    ** exception exit: noconnection
    (test@REDACTED)2>

Contrast with:

    (test@REDACTED)3> exit(foo).
    ** exception exit: foo

In the first case, it is actually not the function that raises the exception, but the shell process that receives an exit signal. It would be nice if there was a visual difference here. The intuitive thing to to is to run "catch my_test:start()” which obviously does nothing since it is not the function that crashes, it is a linked process started by the function that sends an exit signal to the running shell process. Perhaps something along the lines of:

    (test@REDACTED)1> my_test:start().
    Running...
    ** shell process received exit signal: noconnection
    (test@REDACTED)2>

Cheers,
Adam

> 
> Cheers,
> Adam
> 
> > On 4. Dec 2019, at 09:54, Adam Lindberg <hello@REDACTED> wrote:
> > 
> > I have indeed linked processes. I realized that that is why the exception is “uncatchable” in the shell perhaps. Because the shell process dies because it is linked to my test processes, and the function running the test hasn’t encountered an error yet.
> > 
> > Does links in Erlang always crash with {'EXIT', Pid, noconnection} when a node dies?
> > 
> > Cheers,
> > Adam
> > 
> >> On 4. Dec 2019, at 08:38, Lukas Larsson <lukas@REDACTED> wrote:
> >> 
> >> 
> >> 
> >> On Tue, Dec 3, 2019 at 11:56 AM Adam Lindberg <hello@REDACTED> wrote:
> >> Hi!
> >> 
> >> I’m running some tests using distributed Erlang. I set up a cluster of Erlang nodes doing Distributed Systems™ stuff, and a hidden node that have a connection to each of the nodes in that cluster. The hidden node orchestrates the test by starting all Erlang nodes as ports. It then starts a process (gen_server) on each node that manipulates stuff on that node. It also loads some mock modules among other things. The hidden node also has some managing gen_servers running locally, which some of the mocks makes RPC calls to from the cluster nodes (to simulate and orchestrate mocked hardware components).
> >> 
> >> Now I wanted to test how my system behaves when killing some random nodes, chaos monkey style. So I picked the easiest option of using rpc:cast(RandomClusterNode, erlang, halt, [137]). However, now my test dies with the following obscure error: ** exception exit: noconnection. This even happens when first spawning a fun that then calls erlang:halt(137) (as to avoid the RPC connection somehow breaking).
> >> 
> >> After searching a bit on the Internet it seems to be some internal uncatchable (!) error generated by Erlang [1][2], but it is not at all clear when it happens, and how to avoid it. After some debugging in the gen_servers running on the hidden node, I can see the error by setting process_flag(trap_exit, true) and printing it in terminate/2 but I still can’t catch it. I can’t even catch it in the shell by enclosing my run in a try-catch block! It’s almost not mentioned at all in the official documentation [3]. Most likely I’m setting up my test nodes and the application/test code in a way that generates this error, but I have no idea what exactly leads to it.
> >> 
> >> I guess I have two problems:
> >> 
> >> 1. What is the error, and how can I handle / avoid it?
> >> 
> >> I'm not sure, but could it be that your process is linked to a process on the remote side? That what you are getting is a broken link error?
> >> 
> >> 2. Why is it not documented?
> >> 
> >> Cheers,
> >> Adam
> >> 
> >> 
> >> [1]: http://erlang.org/pipermail/erlang-questions/2012-April/066219.html
> >> [2]: http://erlang.org/pipermail/erlang-questions/2013-April/073246.html
> >> [3]: http://erlang.org/doc/getting_started/robustness.html
> >> 
> > 
>