Long gen_server init question (not about order of messages :)

Tue May 18 15:46:36 CEST 2021

On Tue, 18 May 2021 at 14:13, Stanislav Ledenev <s.ledenev@REDACTED> wrote:
> I'd like to clarify the option with an unregistered server.
> It is very interesting, especially due to the fact that it is used in production ( some variant of it).
> But it sounds like that unregistered server starts outside of the supervision tree.
> Or maybe without OTP at all. Is it so?

No.

Where you'd ordinarily write something like this...

start_link(Opts) -> gen_server:start_link({local, ?SERVER}, ?MODULE, Opts, []).

...which registers ?SERVER as soon as possible, you'd defer that part
(leave out the {local, ?SERVER} part):

start_link(Opts) -> gen_server:start_link(?MODULE, Opts, []).

init(Opts) ->
    {continue, ...}

handle_continue(...) ->
    % hard things
    register(?SERVER, self()),
    {whatever_handle_continue_returns}.

And then the caller would be something like this:

do_call(Args) ->
    case whereis(?SERVER) of
        undefined -> not_ready;
        Pid -> gen_server:call(Pid, Args)
% ...

You get the idea, hopefully.

There are a couple of rough edges here:
- If the process is already registered and you start it again, you'll
get an error _after_ you've done the hard work. You can fix that by
checking whether the process exists first, but there's a small race
condition there. If you know that you'll only ever start one, you can
skip that part.
- There's a really small race between the whereis and the
gen_server:call; if this happens when the process is starting, no
problem -- you can ignore that. If the server process dies in between
the two steps, you'll get an error in the gen_server:call. But you
need to deal with that anyway.

All of the above can still live in a supervision tree -- supervised
processes don't have to be registered. The ID that you give to the
supervisor is only scoped to that supervisor, and isn't registered
elsewhere.

The variant we have in production runs across two nodes. Node A
depends on a process in node B. When b_server finishes starting, it
notifies a named process in node A, which puts that pid in an ETS
table in A. If the process dies, or we get a nodedown message, A
clears the ETS table. Then the caller (in A) can grab the pid from the
ETS table and make a call to the process in B.

Now I look back at it, it's overcomplicated for what it does (we'd
planned to have multiple instances of B and have each register for a
shard, but it turned out not to be needed). We could have just used
'global' or 'pg2' (now just 'pg').