** exception exit: noconnection
Tue Dec 3 11:56:28 CET 2019
I’m running some tests using distributed Erlang. I set up a cluster of Erlang nodes doing Distributed Systems™ stuff, and a hidden node that have a connection to each of the nodes in that cluster. The hidden node orchestrates the test by starting all Erlang nodes as ports. It then starts a process (gen_server) on each node that manipulates stuff on that node. It also loads some mock modules among other things. The hidden node also has some managing gen_servers running locally, which some of the mocks makes RPC calls to from the cluster nodes (to simulate and orchestrate mocked hardware components).
Now I wanted to test how my system behaves when killing some random nodes, chaos monkey style. So I picked the easiest option of using rpc:cast(RandomClusterNode, erlang, halt, ). However, now my test dies with the following obscure error: ** exception exit: noconnection. This even happens when first spawning a fun that then calls erlang:halt(137) (as to avoid the RPC connection somehow breaking).
After searching a bit on the Internet it seems to be some internal uncatchable (!) error generated by Erlang , but it is not at all clear when it happens, and how to avoid it. After some debugging in the gen_servers running on the hidden node, I can see the error by setting process_flag(trap_exit, true) and printing it in terminate/2 but I still can’t catch it. I can’t even catch it in the shell by enclosing my run in a try-catch block! It’s almost not mentioned at all in the official documentation . Most likely I’m setting up my test nodes and the application/test code in a way that generates this error, but I have no idea what exactly leads to it.
I guess I have two problems:
1. What is the error, and how can I handle / avoid it?
2. Why is it not documented?
More information about the erlang-questions