EEP 53: Process aliases preventing late replies reaching clients

Rickard Green rickard@REDACTED
Wed Sep 2 18:30:34 CEST 2020


On Wed, Sep 2, 2020 at 5:32 PM Fred Hebert <mononcqc@REDACTED> wrote:

> A lot of this seems neat and useful.
> I am however getting a bit of an odd feeling from this section:
>
> * `monitor/3`. The `monitor/3` BIF is an extension of the `monitor/2` BIF
>>   where the third argument is an option list. As of its introduction the
>>   only accepted option is '{alias, unalias | demonitor |
>> reply_demonitor}'.
>>   The first element of the two tuple indicates that we want the returned
>>   monitor reference to also work as an alias. The second element
>> determines
>>   how the alias should be deactivated:
>>   * `unalias` - The alias will remain until it has been deactivated by the
>>     `unalias/1` BIF
>>   * `demonitor` - The alias will be deactivated when the monitor is
>>     deactivated. That is, either when the `demonitor()` BIF is called on
>>     the monitor, or when the monitor is automatically deactivated by the
>>     reception of a `'DOWN'` message. The alias can still be deactivated
>>     before this happens by calling the `unalias/1` BIF.
>>   * `reply_demonitor` - The alias will be deactivated when either the
>>     monitor is deactivated or a message that has been passed using the
>>     alias is received. If the alias is deactivated due to a message passed
>>     using the alias, the monitor is also deactivated as if the
>> `demonitor()`
>>     BIF had been called.
>
>
> so the tuple is `{alias, TeardownMethod}`. How is `reply_demonitor`
> supposed to work with regards to "received"? Is it removed when the message
> is inserted in the mailbox (and therefore can be faster than demonitor +
> [flush]), when it is processed by a receive expression, or when it exactly
> matches?
>

It is handled between placement in the signal queue and placement in the
message queue which is before a receive expression can access it. As for
all other signals some processing is needed in order to determine what to
do with them. They first enter the signal queue where they are processed by
the receiver. Ordinary messages signals and other signals that are to be
converted into message signals are from there moved into the message queue
by the receiving process itself (you can look at it as the "system" is
doing this as well; the erlang user cannot tell the difference anyway).
Once in the message queue a receive expression can access the message.

For example, when a 'DOWN' signal is received the receiving process needs
to determine what to do with it. If the monitor is still active it is
converted into a 'DOWN' message and then moved into the message queue;
otherwise, dropped. Alias message signals are handled at the same place in
the same manner. If the alias is still active the payload of the signal is
converted into an ordinary message signal and moved into the message queue;
otherwise, dropped.

The 'demonitor' and 'reply_demonitor' just adds some automatic unaliasing
and demonitoring at specific events. 'demonitor' when the corresponding
monitor is deactivated. That is, when demonitor() is called or when a
'DOWN' signal corresponding to the monitor is received. 'reply_demonitor'
when a response is received or the monitor is explicitly demonitored. That
is, either when a reply via the alias is received, a 'DOWN' signal for the
corresponding monitor is received, or when the monitor is explicitly
demonitored. Note that 'reply_demonitor' also automatically deactivates the
monitor. This makes the need for cleanup code almost non-existent and is
also (potentially much) more efficient when cleanup otherwise would be
needed. This since you might have to scan the message queue when cleaning
up.

Would it actually be possible to extend this mechanism to regular monitors?
>
>
This is more or less used when you receive a 'DOWN' signal and the monitor
is automatically deactivated. The 'reply_demonitor' option also effect the
monitor.


> Wouldn't it make sense for `erlang:demonitor(Ref, Opts)` to have the
> option list contain `unalias` instead of specifying that when setting up
> the monitor?
>
>
You would then only get the automatic unaliasing at the scenario when the
monitor is explicitly deactivated using demonitor() and not when a 'DOWN'
signal is received.

Regards,
Rickard

On Wed, Sep 2, 2020 at 1:37 AM Rickard Green <rickard@REDACTED> wrote:
>
>>     Author: Rickard Green <rickard(at)erlang(dot)org>
>>     Status: Draft
>>     Type: Standards Track
>>     Erlang-Version: 24.0
>>     Created: 01-Sept-2019
>>     Post-History:
>> ****
>> EEP 53: Process aliases preventing late replies reaching clients
>> ----
>>
>>
>> Abstract
>> ========
>>
>> Currently there exists no lightweight mechanism for preventing late
>> replies
>> from a server to a client after a timeout or connection loss has occurred.
>> The only way to prevent late replies today is to make the request via
>> a proxy process.
>>
>> The proposed process alias feature is a lightweight mechanism that solves
>> the above problem. A process alias is similar to a registered name that
>> is used temporarily while a request is outstanding. If the request times
>> out or the connection to the server is lost, the alias is deactivated
>> which
>> prevents a late reply from reaching the client.
>>
>> Copyright
>> =========
>>
>> This document has been placed in the public domain.
>>
>>
>> Specification
>> =============
>>
>> An alias is of the Erlang type `reference()` and can be used as
>> destination
>> when sending using the `!` operator, or when sending using the
>> `erlang:send()`
>> and `erlang:send_nosuspend()` BIFs. An alias can be used both on local
>> node and on remote nodes in a distributed system. The alias identifies a
>> process that exist, or has existed, on the node with the node name
>> returned
>> by `node(Alias)`.
>>
>> All references will, as of this, be accepted as destination in the
>> message send
>> operations listed above. If the reference is not an alias or a previous
>> alias
>> that has been deactivated, the message will silently be dropped.
>>
>> These new BIFs are introduced:
>>
>> * `alias/0`, `alias/1`. The `alias()` BIF creates and returns an alias
>> which
>>   can be used when sending messages to the process that called the
>> `alias()`
>>   BIF.
>> * `unalias/1`. The `unalias(Alias)` BIF deactivates an alias that
>> identifies
>>   the calling process. The BIF returns `true` if the alias `Alias`
>> identified
>>   the calling process and thus was deactivated; otherwise, no change of
>> the
>>   alias state was made and `false` is returned.
>> * `monitor/3`. The `monitor/3` BIF is an extension of the `monitor/2` BIF
>>   where the third argument is an option list. As of its introduction the
>>   only accepted option is '{alias, unalias | demonitor |
>> reply_demonitor}'.
>>   The first element of the two tuple indicates that we want the returned
>>   monitor reference to also work as an alias. The second element
>> determines
>>   how the alias should be deactivated:
>>   * `unalias` - The alias will remain until it has been deactivated by the
>>     `unalias/1` BIF
>>   * `demonitor` - The alias will be deactivated when the monitor is
>>     deactivated. That is, either when the `demonitor()` BIF is called on
>>     the monitor, or when the monitor is automatically deactivated by the
>>     reception of a `'DOWN'` message. The alias can still be deactivated
>>     before this happens by calling the `unalias/1` BIF.
>>   * `reply_demonitor` - The alias will be deactivated when either the
>>     monitor is deactivated or a message that has been passed using the
>>     alias is received. If the alias is deactivated due to a message passed
>>     using the alias, the monitor is also deactivated as if the
>> `demonitor()`
>>     BIF had been called.
>>
>> The `spawn_opt()` and `spawn_request()` BIFs have also been extended to
>> accept an option `{monitor, MonitorOpts}` where `MonitorOpts` correspond
>> to
>> the option list of the `monitor/3` BIF.
>>
>> Full documentation of these BIFs and options can be found via
>> [pull request #2735](https://github.com/erlang/otp/pull/2735)
>> containing the reference implementation.
>>
>> It is not possible to retrieve the process identifier of the process
>> identified by an alias, and it is not possible to test if a reference is
>> an
>> alias or not.
>>
>>
>> Motivation
>> ==========
>>
>> As previously stated it is possible to prevent late replies by using a
>> proxy process that forwards the reply to the client. By spawning the proxy
>> process and send its process identifier to the server instead of the
>> clients own process identifier, the proxy can be terminated when the
>> operation times out or the connection is lost. Since the proxy process
>> is not alive, a reply will be silently dropped and no stray message
>> will reach the previous client of the request. This however both makes
>> the code more complicated and less efficient than it needs to be. The
>> inefficiency comes from both the need to create, schedule, execute, and
>> terminate the proxy process and the extra copying of data over the proxy
>> process.
>>
>> When the author of the client code has full control over the client
>> process
>> such late replies can be handled without a proxy since the code can be
>> aware of these potential stray messages and drop them when received. This
>> is, however, not possible when implementing library code. You then either
>> need to use a proxy process, as done by the `gen_statem` behavior, or
>> accept that the client process may get stray messages after a call, as
>> done by `gen_server` behavior.
>>
>> Process aliases solves these issues with a very small overhead.
>>
>>
>> Rationale
>> =========
>>
>> Why use the reference data type for alias?
>> ------------------------------------------
>>
>> This is more or less what the reference data type is there for. A data
>> type
>> that can identify a huge amount of different entities. References are
>> unique
>> and contain a node identifier identifying the the node it originates from.
>> This makes it easy to identify a specific process on a specific node while
>> also identifying different aliases created by the same process. The
>> embedded
>> node identifier makes it easy to provide distribution transparency.
>>
>> Why not make alias an opaque data type?
>> ---------------------------------------
>>
>> The expected most common use case is in a client server request. Such as
>> `gen_server:call()`. Client server requests in Erlang are typically made
>> while monitoring the server from the client. In order to minimize the data
>> produced and sent in the request we want to reuse the reference created
>> for
>> identification of the monitor to also function as an alias. Since the
>> monitor
>> identifier is documented as a reference and is not opaque (which one can
>> argue was a design mistake when introducing monitors), it becomes hard not
>> to document the type of an alias as a reference as well.
>>
>> Why not allow references as registered names in the already existing API?
>> -------------------------------------------------------------------------
>>
>> There are two reasons. Distribution transparency and scalability.
>>
>> Distribution transparency is really desirable since the user can use the
>> functionality the same way regardless of whether it is a node local
>> operation
>> or node remote operation. The name registration API is not distribution
>> transparent.
>>
>> Regarding scalability. Due to how the name registration API has been
>> designed
>> we need some sort of table in order to implement the API. This table will
>> be
>> written to and read from by processes that are executing in parallel. In
>> the
>> use case we are focusing on, names (aliases) are expected to be temporary
>> and
>> created in huge amounts. That is, there will be large amounts of
>> modifications
>> of this table from processes executing on different processors. This will
>> make it challenging to implement such a table that scales well.
>>
>> In the proposed solution the information needed to route the message to
>> the
>> correct place is saved in the alias itself, the reference. The information
>> needed to determine if the message passed via the alias should be dropped
>> or
>> passed along is saved in the process identified by the alias. That is, all
>> information needed is distributed to where it is needed instead of being
>> centralized in a node global table. This approach of distributed
>> information
>> introduce no new synchronization points at all when it has been fully
>> implemented (more on that below) which will scale extremely well. An
>> implementation based on a node global table can *never* compete
>> scalability
>> wise with that.
>>
>> The already existing functionality for registered names cannot be
>> implemented
>> using this distributed information approach, but needs this centralized
>> storage of names. That is, the already existing API cannot be used.
>>
>> Besides node identifier a reference today contains three 32-bit words of
>> data
>> or in other words 96-bits of data. Of these 96 bits only 82 bits are
>> allowed
>> to be passed over the distribution to another node. This for historical
>> reasons. While a reference resides locally it can however contain more or
>> less unlimited amount of data. 82-bits are not enough to make a reference
>> unique on the local node and at the same time uniquely identify a node
>> local
>> process. In order to be able to store all information needed in alias, the
>> reference data type needs to be extended.
>>
>> In the proposed solution references used as aliases are extended to use
>> five 32-bit words on 64-bit architectures and four 32-bit words on 32-bit
>> architectures. Since that much data in a reference cannot be passed over
>> the distribution today, the reference implementation saves aliases that
>> are alive in a node global table. When a node local alias enters the local
>> node over the distribution one needs to look it up in this table in order
>> to
>> be able to restore it to its actual value. While aliases are passed around
>> locally there is no need for look-ups in this table.
>>
>> The reference implementation also modifies the distribution protocol to
>> allow references with up to five 32-bit values. For backwards
>> compatibility
>> reasons this modification of the distribution protocol cannot be used at
>> once
>> when aliases are introduced. This since we need to be able to communicate
>> with
>> older nodes from previous releases for a while. When this has been living
>> in
>> the system for enough time (expected to be OTP 26) we can begin sending
>> references with up to five 32-bit words and remove the usage of the table
>> mapping references over the distribution to aliases. That is, it is not
>> until
>> this happens that the alias implementation is fully complete.
>>
>> Why is it not possible to get the PID of the process that an alias refers
>> to?
>>
>> -----------------------------------------------------------------------------
>>
>> Most importantly there is no need to know the PID of the process that an
>> alias refers to in order to solve the problems that alias are intended
>> to solve. The user is expected to utilize alias in a protocol where one
>> knows
>> whether a reference is an alias or not and should not need to know the
>> PID of
>> the process that it refers to.
>>
>> Besides the above there are also other issues with such functionality. The
>> content of a reference is just a large integer. In order to keep
>> distribution
>> transparency one would either have to specify how this integer should be
>> interpreted or require synchronous signaling with the node where the
>> identified process resides. The synchronous signal-ling will be very
>> expensive. By specifying how the reference integer should be interpreted
>> we
>> would prevent future changes to how the integer of the reference should be
>> interpreted which might prevent future optimizations, improvements and new
>> features. Up until the time when large references with five 32-bit words
>> can
>> be passed over the distribution, synchronous communication is also the
>> only
>> option on how to implement such functionality.
>>
>> If we should mimic the `whereis()` function of the registered name API
>> where
>> you also can see if a name is currently registered, no other option than
>> synchronous signaling with the process identified by the alias is
>> possible.
>>
>> Why is it not possible to test if a reference is an alias?
>> ----------------------------------------------------------
>>
>> The same reason as to why it is not possible to get the PID of the
>> process that is referred to by an alias.
>>
>> Why not allow registration of arbitrary Erlang terms instead?
>> -------------------------------------------------------------
>>
>> Such a feature could solve the same issue that aliases are intended to
>> solve, but there are problems with such an approach.
>>
>> Terms other than pids, ports, and references do not have a node identifier
>> embedded into the data type. For such data types you need some other way
>> to identify the node of where the name is registered. In the current case
>> of atoms as registered names, this is done by wrapping the name in a
>> two-tuple that contains the node name. Something like this is needed for
>> all other terms than just plain pids, ports, and references. This also
>> introduce a problem. Is a two-tuple just a name or a name plus a node
>> identifier?
>>
>> Should it be possible to register a PID as a name for another process?
>> This would force all send operations to first lookup the PID in the
>> table of registered names before performing the operation. This will
>> cost performance in all send operations. The same is true for ports.
>>
>> We don't think registration of arbitrary terms should be implemented
>> due to the problems that arise. Current registration feature that only
>> allows atoms can however be a bit too limiting when you need to register
>> a number of processes for the same service. An option could be to allow
>> registration of two-tuples containing an atom and an integer. Perhaps
>> other terms such as strings should also be allowed, but arbitrary terms
>> should not be allowed.
>>
>> Allowing references as registered names implies scalability bottlenecks
>> not present in the alias API. That is, this would be an inferior solution
>> to the problem we set out to solve.
>>
>> One probably wants to extend name registration with more allowed terms
>> than just atoms, but this for solving other problems than what aliases
>> are intended to solve. The name registration API does not fit aliases
>> so we don't see that aliases should be combined with such an extension
>> of the registration API. The alias solution solves the problem we set out
>> to solve, so this eep is limited to that.
>>
>>
>> Backwards Compatibility
>> =======================
>>
>> The alias feature is a pure extension, so there are no real backwards
>> compatibility issues.
>>
>> In order to be able to communicate aliases over Erlang nodes from
>> previous releases we cannot pass large references over the distribution
>> and therefore need to keep information about aliases in a node global
>> table. The implementation benefits from being able to pass larger
>> references over the distribution, but will not do so until we can make
>> it mandatory to be able to handle such large references. Both OTP 24
>> and OTP 25 will be able to handle large references over the distribution
>> and since we only guarantee distribution compatibility with the two
>> closest releases backwards and forwards we can then make large
>> references mandatory in OTP 26.
>>
>> This node global table for alias introduce an overhead when utilizing
>> aliases compared to sending using the PID of the process. This due
>> to allocation and manipulation of table structures. Comparing to the
>> existing solution of utilizing a proxy process in order to
>> prevent stray messages the overhead of this node global table for
>> aliases is small. Fortunately this node global table also only need to
>> be present temporarily and can be removed in OTP 26.
>>
>>
>> Reference Implementation
>> ========================
>>
>> The reference implementation is provided by
>> [pull request #2735](https://github.com/erlang/otp/pull/2735).
>>
>> Beside implementation of the alias feature. The pull request also contain
>> usage of aliases in the gen behaviors such as gen_server. Due to this it
>> is
>> now also possible to implement `receive_response()` functionality similar
>> to
>> `erpc:receive_response()` which also have been implemented:
>>
>> * `gen_server:receive_response/2`
>> * `gen_statem:receive_response/2`
>> * `gen_event:receive_response/2`
>>
>>
>> [EmacsVar]: <> "Local Variables:"
>> [EmacsVar]: <> "mode: indented-text"
>> [EmacsVar]: <> "indent-tabs-mode: nil"
>> [EmacsVar]: <> "sentence-end-double-space: t"
>> [EmacsVar]: <> "fill-column: 70"
>> [EmacsVar]: <> "coding: utf-8"
>> [EmacsVar]: <> "End:"
>> [VimVar]: <> "vim: set fileencoding=utf-8 expandtab shiftwidth=4
>> softtabstop=4:"
>>
>> --
>> Rickard Green, Erlang/OTP, Ericsson AB
>> _______________________________________________
>> eeps mailing list
>> eeps@REDACTED
>> http://erlang.org/mailman/listinfo/eeps
>>
>

-- 
Rickard Green, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/eeps/attachments/20200902/7e1e6ac9/attachment-0001.htm>


More information about the eeps mailing list