gen_tcp:connect will close when using supervisor.

Sat Nov 21 03:47:33 CET 2020

Hi, George.

Replies are inline.

On 2020/11/19 8:26, George Hope wrote:
> I have a simple gen_server which connects to a tcp port, when I run 
> gen_server directly:
> erl -noshell -s ttest start
> it works fine, But when I run:
> erl -noshell -s tt_sup start_link
> the connection will be closed instantly, it will work if I unlink the 
> supervisor.
> How could I run my simple gen_server with supervisor ?

It looks like what is going on below is a problem of mistaken execution 
context. For example, let's look at tt_sup:start_link/0...

> -module(tt_sup).
> -behaviour(supervisor).
> -export([init/1, start_link/0, start_shell/0]).
> 
> start_link() ->
>      io:format("Supervisor started with PID~p~n", [self()]),
>      {ok, Pid} = supervisor:start_link(tt_sup, []),
>      io:format("Supervisor PID=~p~n", [Pid]),
>      {ok, Pid}.

The code above is sending a message to stdout that a supervisor has 
started with PID self(), but that's not the starting supervisor's PID. 
That's the PID of whatever process is calling tt_sup:start_link/0. The 
tt_sup's PID is the one you are receiving as a return value from calling 
supervisor:start_link/2 (which you capture and also print). The trouble 
here is that if the caller retires then because of the link the freshly 
spawned tt_sup will also retire, and because it is linked to its worker 
ttest, it will be taken down as well.

It appears the process that is kicking things off when you call `erl 
-noshell` is exiting immediately after calling its target function, and 
application:start* is not being used so the Erlang runtime's application 
supervisor is not being used to monitor your supervisor as would 
normally happen.

> start_shell() ->
>      io:format("Supervisor started with PID~p~n", [self()]),
>      {ok, Pid} = supervisor:start_link(tt_sup, []),
>      io:format("Supervisor PID=~p~n", [Pid]),
>      unlink(Pid),
>      {ok, Pid}.

This version does the same thing, but the final action taken before 
returning the PID is to unlink. Remember, this is the caller unlinking 
from the freshly spawned supervisor, not the supervisor unlinking its 
child -- and recall that the caller in this case is the temporary 
process spawned by `erl -noshell`, not a long-lived application supervisor.

So in both cases your ttest *worker* actually is under supervision, the 
only question is whether your supervisor is linked to whatever called it.

In a normal application you want an application supervisor to be 
starting your top level supervisor and it should be linked to it, so 
supervisor:start_link/* is the appropriate thing to call (an in fact 
there are no supervisor:start_monitor or supervisor:start functions 
because it is expected that supervisors will be written in the context 
of OTP compliant applications). In simple command line execution (via 
escript or another utility) it is a little more gray whether or not you 
really care about having it be supervised.

One way to make this work from the command line is to either make it an 
escript, or wrap it up as an OTP application and have a utility launch 
it that provides a full execution context. There are release builders 
like rebar3 that can do this, or I have a project that provides a more 
dynamic execution environment that makes writing and executing Erlang 
feel a little more like working with Python.

Writing full-blown Erlang applications involves a little bit of extra 
scaffolding and a touch of boilerplate to get started, but the benefits 
are immense, so it's worth it.

[WARNING: A shameless plug for my own project follows...]

You might find this useful: https://zxq9.com/projects/zomp/
Here is a video talk-through of using it to build a chat server: 
https://www.youtube.com/watch?v=yyM4N8cuau0

Using that I do `zx create project` and follow the (somewhat overly 
verbose) prompts. Select "CLI application" (if you just want it run 
similar to a script) or "Traditional Erlang application" if you want it 
to be supervised like a normal long-lived Erlang application. ZX will 
template a project for you of either style.

The default CLI application is basically a "hello, world" that runs in a 
full execution environment, and the default "Erlang application" 
template is a simple telnet chat/echo server (the basis for change in 
the example video above). You can modify either one to do what you want. 
There are comments in the templated source files that explain what all 
the pieces do.

Running the application from the project directory is `zx runlocal` or 
running it from anywhere else is `zx rundir [path to project]`. ZX will 
build or rebuild whatever changes you've made automatically.

The most important thing you can take away from the above is to learn 
how OTP applications are structured so that you can live in that world 
comfortably and not be confused. It doesn't matter whether you use ZX, 
rebar3, erlang.mk or run your projects by hand as you were above, the 
point is to grok what is happening and "what process is running the 
piece of code I'm looking at right now?"

Hopefully this explains more than it confuses.

Have fun making stuff!
-Craig