** exception exit: noconnection

Wed Dec 4 09:54:39 CET 2019

I have indeed linked processes. I realized that that is why the exception is “uncatchable” in the shell perhaps. Because the shell process dies because it is linked to my test processes, and the function running the test hasn’t encountered an error yet.

Does links in Erlang always crash with {'EXIT', Pid, noconnection} when a node dies?

Cheers,
Adam

> On 4. Dec 2019, at 08:38, Lukas Larsson <lukas@REDACTED> wrote:
> 
> 
> 
> On Tue, Dec 3, 2019 at 11:56 AM Adam Lindberg <hello@REDACTED> wrote:
> Hi!
> 
> I’m running some tests using distributed Erlang. I set up a cluster of Erlang nodes doing Distributed Systems™ stuff, and a hidden node that have a connection to each of the nodes in that cluster. The hidden node orchestrates the test by starting all Erlang nodes as ports. It then starts a process (gen_server) on each node that manipulates stuff on that node. It also loads some mock modules among other things. The hidden node also has some managing gen_servers running locally, which some of the mocks makes RPC calls to from the cluster nodes (to simulate and orchestrate mocked hardware components).
> 
> Now I wanted to test how my system behaves when killing some random nodes, chaos monkey style. So I picked the easiest option of using rpc:cast(RandomClusterNode, erlang, halt, [137]). However, now my test dies with the following obscure error: ** exception exit: noconnection. This even happens when first spawning a fun that then calls erlang:halt(137) (as to avoid the RPC connection somehow breaking).
> 
> After searching a bit on the Internet it seems to be some internal uncatchable (!) error generated by Erlang [1][2], but it is not at all clear when it happens, and how to avoid it. After some debugging in the gen_servers running on the hidden node, I can see the error by setting process_flag(trap_exit, true) and printing it in terminate/2 but I still can’t catch it. I can’t even catch it in the shell by enclosing my run in a try-catch block! It’s almost not mentioned at all in the official documentation [3]. Most likely I’m setting up my test nodes and the application/test code in a way that generates this error, but I have no idea what exactly leads to it.
> 
> I guess I have two problems:
> 
> 1. What is the error, and how can I handle / avoid it?
> 
> I'm not sure, but could it be that your process is linked to a process on the remote side? That what you are getting is a broken link error?
>  
> 2. Why is it not documented?
> 
> Cheers,
> Adam
> 
> 
> [1]: http://erlang.org/pipermail/erlang-questions/2012-April/066219.html
> [2]: http://erlang.org/pipermail/erlang-questions/2013-April/073246.html
> [3]: http://erlang.org/doc/getting_started/robustness.html
>