EEP 53: Process aliases preventing late replies reaching clients

Wed Sep 2 17:31:37 CEST 2020

A lot of this seems neat and useful.
I am however getting a bit of an odd feeling from this section:

* `monitor/3`. The `monitor/3` BIF is an extension of the `monitor/2` BIF
>   where the third argument is an option list. As of its introduction the
>   only accepted option is '{alias, unalias | demonitor | reply_demonitor}'.
>   The first element of the two tuple indicates that we want the returned
>   monitor reference to also work as an alias. The second element determines
>   how the alias should be deactivated:
>   * `unalias` - The alias will remain until it has been deactivated by the
>     `unalias/1` BIF
>   * `demonitor` - The alias will be deactivated when the monitor is
>     deactivated. That is, either when the `demonitor()` BIF is called on
>     the monitor, or when the monitor is automatically deactivated by the
>     reception of a `'DOWN'` message. The alias can still be deactivated
>     before this happens by calling the `unalias/1` BIF.
>   * `reply_demonitor` - The alias will be deactivated when either the
>     monitor is deactivated or a message that has been passed using the
>     alias is received. If the alias is deactivated due to a message passed
>     using the alias, the monitor is also deactivated as if the
> `demonitor()`
>     BIF had been called.

so the tuple is `{alias, TeardownMethod}`. How is `reply_demonitor`
supposed to work with regards to "received"? Is it removed when the message
is inserted in the mailbox (and therefore can be faster than demonitor +
[flush]), when it is processed by a receive expression, or when it exactly
matches? Would it actually be possible to extend this mechanism to regular
monitors?

Wouldn't it make sense for `erlang:demonitor(Ref, Opts)` to have the option
list contain `unalias` instead of specifying that when setting up the
monitor?

On Wed, Sep 2, 2020 at 1:37 AM Rickard Green <rickard@REDACTED> wrote:

>     Author: Rickard Green <rickard(at)erlang(dot)org>
>     Status: Draft
>     Type: Standards Track
>     Erlang-Version: 24.0
>     Created: 01-Sept-2019
>     Post-History:
> ****
> EEP 53: Process aliases preventing late replies reaching clients
> ----
>
>
> Abstract
> ========
>
> Currently there exists no lightweight mechanism for preventing late replies
> from a server to a client after a timeout or connection loss has occurred.
> The only way to prevent late replies today is to make the request via
> a proxy process.
>
> The proposed process alias feature is a lightweight mechanism that solves
> the above problem. A process alias is similar to a registered name that
> is used temporarily while a request is outstanding. If the request times
> out or the connection to the server is lost, the alias is deactivated which
> prevents a late reply from reaching the client.
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> Specification
> =============
>
> An alias is of the Erlang type `reference()` and can be used as destination
> when sending using the `!` operator, or when sending using the
> `erlang:send()`
> and `erlang:send_nosuspend()` BIFs. An alias can be used both on local
> node and on remote nodes in a distributed system. The alias identifies a
> process that exist, or has existed, on the node with the node name returned
> by `node(Alias)`.
>
> All references will, as of this, be accepted as destination in the message
> send
> operations listed above. If the reference is not an alias or a previous
> alias
> that has been deactivated, the message will silently be dropped.
>
> These new BIFs are introduced:
>
> * `alias/0`, `alias/1`. The `alias()` BIF creates and returns an alias
> which
>   can be used when sending messages to the process that called the
> `alias()`
>   BIF.
> * `unalias/1`. The `unalias(Alias)` BIF deactivates an alias that
> identifies
>   the calling process. The BIF returns `true` if the alias `Alias`
> identified
>   the calling process and thus was deactivated; otherwise, no change of the
>   alias state was made and `false` is returned.
> * `monitor/3`. The `monitor/3` BIF is an extension of the `monitor/2` BIF
>   where the third argument is an option list. As of its introduction the
>   only accepted option is '{alias, unalias | demonitor | reply_demonitor}'.
>   The first element of the two tuple indicates that we want the returned
>   monitor reference to also work as an alias. The second element determines
>   how the alias should be deactivated:
>   * `unalias` - The alias will remain until it has been deactivated by the
>     `unalias/1` BIF
>   * `demonitor` - The alias will be deactivated when the monitor is
>     deactivated. That is, either when the `demonitor()` BIF is called on
>     the monitor, or when the monitor is automatically deactivated by the
>     reception of a `'DOWN'` message. The alias can still be deactivated
>     before this happens by calling the `unalias/1` BIF.
>   * `reply_demonitor` - The alias will be deactivated when either the
>     monitor is deactivated or a message that has been passed using the
>     alias is received. If the alias is deactivated due to a message passed
>     using the alias, the monitor is also deactivated as if the
> `demonitor()`
>     BIF had been called.
>
> The `spawn_opt()` and `spawn_request()` BIFs have also been extended to
> accept an option `{monitor, MonitorOpts}` where `MonitorOpts` correspond to
> the option list of the `monitor/3` BIF.
>
> Full documentation of these BIFs and options can be found via
> [pull request #2735](https://github.com/erlang/otp/pull/2735)
> containing the reference implementation.
>
> It is not possible to retrieve the process identifier of the process
> identified by an alias, and it is not possible to test if a reference is an
> alias or not.
>
>
> Motivation
> ==========
>
> As previously stated it is possible to prevent late replies by using a
> proxy process that forwards the reply to the client. By spawning the proxy
> process and send its process identifier to the server instead of the
> clients own process identifier, the proxy can be terminated when the
> operation times out or the connection is lost. Since the proxy process
> is not alive, a reply will be silently dropped and no stray message
> will reach the previous client of the request. This however both makes
> the code more complicated and less efficient than it needs to be. The
> inefficiency comes from both the need to create, schedule, execute, and
> terminate the proxy process and the extra copying of data over the proxy
> process.
>
> When the author of the client code has full control over the client process
> such late replies can be handled without a proxy since the code can be
> aware of these potential stray messages and drop them when received. This
> is, however, not possible when implementing library code. You then either
> need to use a proxy process, as done by the `gen_statem` behavior, or
> accept that the client process may get stray messages after a call, as
> done by `gen_server` behavior.
>
> Process aliases solves these issues with a very small overhead.
>
>
> Rationale
> =========
>
> Why use the reference data type for alias?
> ------------------------------------------
>
> This is more or less what the reference data type is there for. A data type
> that can identify a huge amount of different entities. References are
> unique
> and contain a node identifier identifying the the node it originates from.
> This makes it easy to identify a specific process on a specific node while
> also identifying different aliases created by the same process. The
> embedded
> node identifier makes it easy to provide distribution transparency.
>
> Why not make alias an opaque data type?
> ---------------------------------------
>
> The expected most common use case is in a client server request. Such as
> `gen_server:call()`. Client server requests in Erlang are typically made
> while monitoring the server from the client. In order to minimize the data
> produced and sent in the request we want to reuse the reference created for
> identification of the monitor to also function as an alias. Since the
> monitor
> identifier is documented as a reference and is not opaque (which one can
> argue was a design mistake when introducing monitors), it becomes hard not
> to document the type of an alias as a reference as well.
>
> Why not allow references as registered names in the already existing API?
> -------------------------------------------------------------------------
>
> There are two reasons. Distribution transparency and scalability.
>
> Distribution transparency is really desirable since the user can use the
> functionality the same way regardless of whether it is a node local
> operation
> or node remote operation. The name registration API is not distribution
> transparent.
>
> Regarding scalability. Due to how the name registration API has been
> designed
> we need some sort of table in order to implement the API. This table will
> be
> written to and read from by processes that are executing in parallel. In
> the
> use case we are focusing on, names (aliases) are expected to be temporary
> and
> created in huge amounts. That is, there will be large amounts of
> modifications
> of this table from processes executing on different processors. This will
> make it challenging to implement such a table that scales well.
>
> In the proposed solution the information needed to route the message to the
> correct place is saved in the alias itself, the reference. The information
> needed to determine if the message passed via the alias should be dropped
> or
> passed along is saved in the process identified by the alias. That is, all
> information needed is distributed to where it is needed instead of being
> centralized in a node global table. This approach of distributed
> information
> introduce no new synchronization points at all when it has been fully
> implemented (more on that below) which will scale extremely well. An
> implementation based on a node global table can *never* compete scalability
> wise with that.
>
> The already existing functionality for registered names cannot be
> implemented
> using this distributed information approach, but needs this centralized
> storage of names. That is, the already existing API cannot be used.
>
> Besides node identifier a reference today contains three 32-bit words of
> data
> or in other words 96-bits of data. Of these 96 bits only 82 bits are
> allowed
> to be passed over the distribution to another node. This for historical
> reasons. While a reference resides locally it can however contain more or
> less unlimited amount of data. 82-bits are not enough to make a reference
> unique on the local node and at the same time uniquely identify a node
> local
> process. In order to be able to store all information needed in alias, the
> reference data type needs to be extended.
>
> In the proposed solution references used as aliases are extended to use
> five 32-bit words on 64-bit architectures and four 32-bit words on 32-bit
> architectures. Since that much data in a reference cannot be passed over
> the distribution today, the reference implementation saves aliases that
> are alive in a node global table. When a node local alias enters the local
> node over the distribution one needs to look it up in this table in order
> to
> be able to restore it to its actual value. While aliases are passed around
> locally there is no need for look-ups in this table.
>
> The reference implementation also modifies the distribution protocol to
> allow references with up to five 32-bit values. For backwards compatibility
> reasons this modification of the distribution protocol cannot be used at
> once
> when aliases are introduced. This since we need to be able to communicate
> with
> older nodes from previous releases for a while. When this has been living
> in
> the system for enough time (expected to be OTP 26) we can begin sending
> references with up to five 32-bit words and remove the usage of the table
> mapping references over the distribution to aliases. That is, it is not
> until
> this happens that the alias implementation is fully complete.
>
> Why is it not possible to get the PID of the process that an alias refers
> to?
>
> -----------------------------------------------------------------------------
>
> Most importantly there is no need to know the PID of the process that an
> alias refers to in order to solve the problems that alias are intended
> to solve. The user is expected to utilize alias in a protocol where one
> knows
> whether a reference is an alias or not and should not need to know the PID
> of
> the process that it refers to.
>
> Besides the above there are also other issues with such functionality. The
> content of a reference is just a large integer. In order to keep
> distribution
> transparency one would either have to specify how this integer should be
> interpreted or require synchronous signaling with the node where the
> identified process resides. The synchronous signal-ling will be very
> expensive. By specifying how the reference integer should be interpreted we
> would prevent future changes to how the integer of the reference should be
> interpreted which might prevent future optimizations, improvements and new
> features. Up until the time when large references with five 32-bit words
> can
> be passed over the distribution, synchronous communication is also the only
> option on how to implement such functionality.
>
> If we should mimic the `whereis()` function of the registered name API
> where
> you also can see if a name is currently registered, no other option than
> synchronous signaling with the process identified by the alias is possible.
>
> Why is it not possible to test if a reference is an alias?
> ----------------------------------------------------------
>
> The same reason as to why it is not possible to get the PID of the
> process that is referred to by an alias.
>
> Why not allow registration of arbitrary Erlang terms instead?
> -------------------------------------------------------------
>
> Such a feature could solve the same issue that aliases are intended to
> solve, but there are problems with such an approach.
>
> Terms other than pids, ports, and references do not have a node identifier
> embedded into the data type. For such data types you need some other way
> to identify the node of where the name is registered. In the current case
> of atoms as registered names, this is done by wrapping the name in a
> two-tuple that contains the node name. Something like this is needed for
> all other terms than just plain pids, ports, and references. This also
> introduce a problem. Is a two-tuple just a name or a name plus a node
> identifier?
>
> Should it be possible to register a PID as a name for another process?
> This would force all send operations to first lookup the PID in the
> table of registered names before performing the operation. This will
> cost performance in all send operations. The same is true for ports.
>
> We don't think registration of arbitrary terms should be implemented
> due to the problems that arise. Current registration feature that only
> allows atoms can however be a bit too limiting when you need to register
> a number of processes for the same service. An option could be to allow
> registration of two-tuples containing an atom and an integer. Perhaps
> other terms such as strings should also be allowed, but arbitrary terms
> should not be allowed.
>
> Allowing references as registered names implies scalability bottlenecks
> not present in the alias API. That is, this would be an inferior solution
> to the problem we set out to solve.
>
> One probably wants to extend name registration with more allowed terms
> than just atoms, but this for solving other problems than what aliases
> are intended to solve. The name registration API does not fit aliases
> so we don't see that aliases should be combined with such an extension
> of the registration API. The alias solution solves the problem we set out
> to solve, so this eep is limited to that.
>
>
> Backwards Compatibility
> =======================
>
> The alias feature is a pure extension, so there are no real backwards
> compatibility issues.
>
> In order to be able to communicate aliases over Erlang nodes from
> previous releases we cannot pass large references over the distribution
> and therefore need to keep information about aliases in a node global
> table. The implementation benefits from being able to pass larger
> references over the distribution, but will not do so until we can make
> it mandatory to be able to handle such large references. Both OTP 24
> and OTP 25 will be able to handle large references over the distribution
> and since we only guarantee distribution compatibility with the two
> closest releases backwards and forwards we can then make large
> references mandatory in OTP 26.
>
> This node global table for alias introduce an overhead when utilizing
> aliases compared to sending using the PID of the process. This due
> to allocation and manipulation of table structures. Comparing to the
> existing solution of utilizing a proxy process in order to
> prevent stray messages the overhead of this node global table for
> aliases is small. Fortunately this node global table also only need to
> be present temporarily and can be removed in OTP 26.
>
>
> Reference Implementation
> ========================
>
> The reference implementation is provided by
> [pull request #2735](https://github.com/erlang/otp/pull/2735).
>
> Beside implementation of the alias feature. The pull request also contain
> usage of aliases in the gen behaviors such as gen_server. Due to this it is
> now also possible to implement `receive_response()` functionality similar
> to
> `erpc:receive_response()` which also have been implemented:
>
> * `gen_server:receive_response/2`
> * `gen_statem:receive_response/2`
> * `gen_event:receive_response/2`
>
>
> [EmacsVar]: <> "Local Variables:"
> [EmacsVar]: <> "mode: indented-text"
> [EmacsVar]: <> "indent-tabs-mode: nil"
> [EmacsVar]: <> "sentence-end-double-space: t"
> [EmacsVar]: <> "fill-column: 70"
> [EmacsVar]: <> "coding: utf-8"
> [EmacsVar]: <> "End:"
> [VimVar]: <> "vim: set fileencoding=utf-8 expandtab shiftwidth=4
> softtabstop=4:"
>
> --
> Rickard Green, Erlang/OTP, Ericsson AB
> _______________________________________________
> eeps mailing list
> eeps@REDACTED
> http://erlang.org/mailman/listinfo/eeps
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/eeps/attachments/20200902/815af4b9/attachment-0001.htm>