[erlang-questions] RFC: On `inet_tcp_dist` and `erl_epmd` interaction

Tue Oct 25 16:00:59 CEST 2011

    Ok... As not many responded to my email, actually no-one :), I've
prepared a small branch based on R14B04 release, which fixes what I've
proposed.

    My patches (3 small ones) are found at:
        https://github.com/cipriancraciun/otp/tree/patches/erl_epmd-as-proper-gen_server

    To fetch:
        git fetch git://github.com/cipriancraciun/otp.git
patches/erl_epmd-as-proper-gen_server

    To compare my patches:
        https://github.com/cipriancraciun/otp/compare/patches%2Ferl_epmd-as-proper-gen_server
        https://github.com/cipriancraciun/otp/compare/patches%2Ferl_epmd-as-proper-gen_server.patch

    (I hope I got "submitting patches" right. :) )

    I'll wait a couple of days and if there is no feedback, (or better
if there is positive feedback), I'll submit it to `erlang-patches`
mailing list.

    Ciprian.


    P.S.: I've seen that there is also another patch pending in `pu`
branch related to `erl_epmd` which adds support for IPv6. I think my
patch won't cleanly apply over this (as we touch the same functions),
but from what I've seen the fix-up is trivial. How should I handle
this situation? (I think I should prepare a forth patch to merge with
`pu`, right?)


On Mon, Oct 24, 2011 at 16:27, Ciprian Dorin Craciun
<ciprian.craciun@REDACTED> wrote:
> == Summary ==
>
>    I've found out that it is "theoretically" possible to override the
> behavior of the default `erl_epmd` module with a custom, but
> "compatible" module, without touching the `kernel` application (only
> through configuration directives). I've labeled this method as
> "theoretical" because the way in which the modules `erl_epmd` and
> `inet_tcp_dist` (or any of the `inet_*_dist` family) interact makes
> them inseparable.
>
>    I'm writing this email as I want to help in enabling the
> overriding of the default `erl_epmd` module in a correct, simple, and
> the least intrusive method possible. (By "I want to help" I mean I am
> offering to discuss, write, document and test the code.)
>
>
> == Problem description  ==
>
>    As stated, there is a function `net_kernel:epmd_module`, which
> conforming to the (source code) documentation should (quote): "return
> module_name of erl_epmd or similar gen_server_module".
>        https://github.com/erlang/otp/blob/OTP_R14B04/lib/kernel/src/net_kernel.erl#L1283
>
>    Unfortunately its only usage is in `erl_distribution.erl` to start
> the `gen_server` process.
>        https://github.com/erlang/otp/blob/OTP_R14B04/lib/kernel/src/erl_distribution.erl#L39
>
>    All the other important modules `inet_*_dist`, `net_adm` directly
> use the module `erl_epmd`, without the `net_kernel` indirection.
>        https://github.com/erlang/otp/blob/OTP_R14B04/lib/kernel/src/inet_tcp_dist.erl#L70
>        https://github.com/erlang/otp/blob/OTP_R14B04/lib/kernel/src/inet_tcp_dist.erl#L254
>
>    As a result it is impossible to actually replace the way in which
> `inet_*_dist` modules resolve the transport layer address (more
> exactly the port) of the other nodes.
>
>
> == Problem analysis ==
>
>    I think there are possible purposes of the `net_kernel:epmd_module`:
>    a) to give the name of a module which should export a `start_link`
> function, which in turn spawns a process, registering under the name
> `erl_epmd` and responding to `erl_epmd` messages in a proper manner
> (thus implementing the "internal" `erl_epmd` protocol); (and as a
> backend, maybe the UDP EPMD protocol;)
>    b) or to give the name of a module which should export the
> `register_node/2`, `port_please/2`, `names/0`, and `names/1` functions
> which should act according to the specs in `erl_epmd` (thus
> implementing the `erl_epmd` "interface" / behavior);
>
>    As such there is a decision between "implementing a message
> protocol" or "implementing an interface". I.e.:
>    * in the first case (implementing the `erl_epmd` internal
> protocol) the overriding module receives messages, and responds to
> them in a proper manner; but the "clients" still use the `erl_epmd`
> module as a frontend (which in turn sends messages to the named
> `erl_epmd` process);
>    * in the second case *all* clients should use the overriding
> module (via `net_kernel:epmd_module`), and this one in its turn is
> free to implement the "interface" functions as it sees fit as long as
> it doesn't break the spec;
>
>    Now the way in which `net_kernel:epmd_module` is used (only once
> to start the server) and the fact that all `inet_*_dist` modules use
> directly the `erl_epmd` module, suggests that the initial plan was to
> go with solution a) -- i.e. the overriding module should register a
> process under the well established name, and it should respond to
> messages. (This is also suggested by the documentation quote: "or
> similar gen_server_module".)
>
>    Unfortunately the way in which `erl_epmd` module is implemented
> suggests method b). Actually it is even worse:
>    * half of the functionality is implemented by delegating work to a
> `gen_server` process, see `register_node` function:
>        https://github.com/erlang/otp/blob/OTP_R14B04/lib/kernel/src/erl_epmd.erl#L108
>    * and half is implemented by directly executing the code in the
> "client" process, see `port_please` and `names` functions, which in
> turn call `get_port` and `get_names`:
>        https://github.com/erlang/otp/blob/OTP_R14B04/lib/kernel/src/erl_epmd.erl#L292
>        https://github.com/erlang/otp/blob/OTP_R14B04/lib/kernel/src/erl_epmd.erl#L418
>
>
> == Solution ==
>
>    Now by me, method a) (as presented above, i.e. implementing the
> internal `erl_epmd` protocol by a named process) is the one most
> "in-line" with OTP principles. (But even b) could work.)
>
>    Thus in order to touch as little as possible the existing code, I
> would propose to:
>    * update `erl_epmd` module, so that all the "public" functions
> (i.e. `port_please`, `names`, etc.) in fact send a message through
> `gen_server:call` to that process registered under the `erl_epmd` name
> (as `register_node` does);
>    * the default implementation in `erl_epmd` in `handle_call`,
> spawns a new process where it calls the internal `get_port` or
> `get_names` and replies to the original call via `gen_server:reply`;
> (to keep the concurrency model as is now, without serializing
> requests);
>
>
> == Conclusion ==
>
>    For me -- and the project I'm involved in -- it is really
> imperative to be able to replace the way in which ports are resolved.
> I could do this by branching OTP, and maintaining a set of patches.
> But I would prefer (and I think it could benefit others too) to "fix"
> the current situation.
>
>    As stated in the summary, I'm offering to write the patch and test
> it. But before I come up with a patch, I want to ask for feedback as
> maybe I've missed something. Therefore any feedback is very important
> to me.
>
>    Thanks for the time (as the email is quite long) :)
>    Ciprian.
>
>
>    P.S.: The reason I want to replace the current `erl_epmd` module I
> can describe in a different thread. (There are actually two different
> but related reasons, one not being directly tied to this problem, but
> both are related to the `-no_epmd` option, which I've tried to discuss
> in a previous thread.)