<div dir="ltr"><div>EEP 53 has been updated due to decisions made by OTP technical board. See the "change log" section in the EEP. After these changes, the EEP has also been accepted by the OTP technical board. <br></div><div><br></div><div>The updated PR 2735 <<a href="https://github.com/erlang/otp/pull/2735">https://github.com/erlang/otp/pull/2735</a>> containing the prototype implementation has also been merged into the master branch.<br></div><div><br></div><div>-- <br><div dir="ltr" class="gmail_signature">Rickard Green, Erlang/OTP, Ericsson AB</div></div><div><br></div><div>    Author: Rickard Green <rickard(at)erlang(dot)org><br>    Status: Accepted/24.0 Proposal is to be implemented in OTP release 24<br>    Type: Standards Track<br>    Erlang-Version: 24.0<br>    Created: 01-Sept-2019<br>    Post-History:<br>****<br>EEP 53: Process aliases preventing late replies reaching clients<br>----<br><br><br>Abstract<br>========<br><br>Currently there exists no lightweight mechanism for preventing late replies<br>from a server to a client after a timeout or connection loss has occurred.<br>The only way to prevent late replies today is to make the request via<br>a proxy process.<br><br>The proposed process alias feature is a lightweight mechanism that solves<br>the above problem. A process alias is similar to a registered name that<br>is used temporarily while a request is outstanding. If the request times<br>out or the connection to the server is lost, the alias is deactivated which<br>prevents a late reply from reaching the client.<br><br>Copyright<br>=========<br><br>This document has been placed in the public domain.<br><br><br>Specification<br>=============<br><br>An alias is of the Erlang type `reference()` and can be used as destination<br>when sending using the `!` operator, or when sending using the `erlang:send()`<br>and `erlang:send_nosuspend()` BIFs. An alias can be used both on local<br>node and on remote nodes in a distributed system. The alias identifies a<br>process that exist, or has existed, on the node with the node name returned<br>by `node(Alias)`.<br><br>All references will, as of this, be accepted as destination in the message send<br>operations listed above. If the reference is not an alias or a previous alias<br>that has been deactivated, the message will silently be dropped.<br><br>These new BIFs are introduced:<br><br>* `alias/0`, `alias/1`. The `alias()` BIF creates and returns an alias which<br>  can be used when sending messages to the process that called the `alias()`<br>  BIF. The `alias/1` BIF takes an option list as argument with the following<br>  accepted options:<br>    * `explicit_unalias` - The alias will remain until it has been<br>      deactivated by the `unalias/1` BIF.<br>    * `reply` - The alias will be automatically deactivated when a reply<br>      message sent using the alias is received.<br><br>* `unalias/1`. The `unalias(Alias)` BIF deactivates an alias that identifies<br>  the calling process. The BIF returns `true` if the alias `Alias` identified<br>  the calling process and thus was deactivated; otherwise, no change of the<br>  alias state was made and `false` is returned.<br><br>* `monitor/3`. The `monitor/3` BIF is an extension of the `monitor/2` BIF<br>  where the third argument is an option list. As of its introduction it<br>  accepts two options:<br>    * `{alias, UnaliasOpt}`. The first element of the two tuple indicates<br>      that we want the returned monitor reference to also work as an alias.<br>      The second element determines how the alias should be deactivated:<br>        * `explicit_unalias` - The alias will remain until it has been<br>          deactivated by the `unalias/1` BIF.<br>        * `demonitor` - The alias will be deactivated when the monitor is<br>          deactivated. That is, either when the `demonitor()` BIF is called<br>          on the monitor, or when the monitor is automatically deactivated<br>          by the reception of a `'DOWN'` message. The alias can still be<br>          deactivated before this happens by calling the `unalias/1` BIF.<br>        * `reply_demonitor` - The alias will be deactivated when either<br>          the monitor is deactivated or a message that has been passed<br>          using the alias is received. If the alias is deactivated due to<br>          a message passed using the alias, the monitor is also deactivated<br>          as if the `demonitor()` BIF had been called.<br><br>    * `{tag, UserDefinedTag}`. This will replace the default `Tag` with<br>      `UserDefinedTag` in the monitor message delivered when the monitor is<br>      triggered. For example, when monitoring a process, the `'DOWN'` tag in<br>      the down message will be replaced by `UserDefinedTag`. <br><br>The `spawn_opt()` and `spawn_request()` BIFs have also been extended to<br>accept an option `{monitor, MonitorOpts}` where `MonitorOpts` correspond to<br>the option list of the `monitor/3` BIF.<br><br>Full documentation of these BIFs and options can be found via<br>[pull request #2735](<a href="https://github.com/erlang/otp/pull/2735">https://github.com/erlang/otp/pull/2735</a>)<br>containing the reference implementation.<br><br>It is not possible to retrieve the process identifier of the process<br>identified by an alias, and it is not possible to test if a reference is an<br>alias or not.<br><br><br>Motivation<br>==========<br><br>As previously stated it is possible to prevent late replies by using a<br>proxy process that forwards the reply to the client. By spawning the proxy<br>process and send its process identifier to the server instead of the<br>clients own process identifier, the proxy can be terminated when the<br>operation times out or the connection is lost. Since the proxy process<br>is not alive, a reply will be silently dropped and no stray message<br>will reach the previous client of the request. This however both makes<br>the code more complicated and less efficient than it needs to be. The<br>inefficiency comes from both the need to create, schedule, execute, and<br>terminate the proxy process and the extra copying of data over the proxy<br>process.<br><br>When the author of the client code has full control over the client process<br>such late replies can be handled without a proxy since the code can be<br>aware of these potential stray messages and drop them when received. This<br>is, however, not possible when implementing library code. You then either<br>need to use a proxy process, as done by the `gen_statem` behavior, or<br>accept that the client process may get stray messages after a call, as<br>done by the `gen_server` behavior.<br><br>Process aliases solves these issues with a very small overhead.<br><br><br>Rationale<br>=========<br><br>Why use the reference data type for alias?<br>------------------------------------------<br><br>This is more or less what the reference data type is there for. A data type<br>that can identify a huge amount of different entities. References are unique<br>and contain a node identifier identifying the the node it originates from.<br>This makes it easy to identify a specific process on a specific node while<br>also identifying different aliases created by the same process. The embedded<br>node identifier makes it easy to provide distribution transparency.<br><br>Why not make alias an opaque data type?<br>---------------------------------------<br><br>The expected most common use case is in a client server request. Such as<br>`gen_server:call()`. Client server requests in Erlang are typically made<br>while monitoring the server from the client. In order to minimize the data<br>produced and sent in the request we want to reuse the reference created for<br>identification of the monitor to also function as an alias. Since the monitor<br>identifier is documented as a reference and is not opaque (which one can<br>argue was a design mistake when introducing monitors), it becomes hard not<br>to document the type of an alias as a reference as well.<br><br>Why not allow references as registered names in the already existing API?<br>-------------------------------------------------------------------------<br><br>There are two reasons. Distribution transparency and scalability.<br><br>Distribution transparency is really desirable since the user can use the<br>functionality the same way regardless of whether it is a node local operation<br>or node remote operation. The name registration API is not distribution<br>transparent.<br><br>Regarding scalability. Due to how the name registration API has been designed<br>we need some sort of table in order to implement the API. This table will be<br>written to and read from by processes that are executing in parallel. In the<br>use case we are focusing on, names (aliases) are expected to be temporary and<br>created in huge amounts. That is, there will be large amounts of modifications<br>of this table from processes executing on different processors. This will<br>make it challenging to implement such a table that scales well.<br><br>In the proposed solution the information needed to route the message to the<br>correct place is saved in the alias itself, the reference. The information<br>needed to determine if the message passed via the alias should be dropped or<br>passed along is saved in the process identified by the alias. That is, all<br>information needed is distributed to where it is needed instead of being<br>centralized in a node global table. This approach of distributed information<br>introduce no new synchronization points at all when it has been fully<br>implemented (more on that below) which will scale extremely well. An<br>implementation based on a node global table can *never* compete scalability<br>wise with that.<br><br>The already existing functionality for registered names cannot be implemented<br>using this distributed information approach, but needs this centralized<br>storage of names. That is, the already existing API cannot be used.<br><br>Besides node identifier a reference today contains three 32-bit words of data<br>or in other words 96-bits of data. Of these 96 bits only 82 bits are allowed<br>to be passed over the distribution to another node. This for historical<br>reasons. While a reference resides locally it can however contain more or<br>less unlimited amount of data. 82-bits are not enough to make a reference<br>unique on the local node and at the same time uniquely identify a node local<br>process. In order to be able to store all information needed in alias, the<br>reference data type needs to be extended.<br><br>In the proposed solution references used as aliases are extended to use<br>five 32-bit words on 64-bit architectures and four 32-bit words on 32-bit<br>architectures. Since that much data in a reference cannot be passed over<br>the distribution today, the reference implementation saves aliases that<br>are alive in a node global table. When a node local alias enters the local<br>node over the distribution one needs to look it up in this table in order to<br>be able to restore it to its actual value. While aliases are passed around<br>locally there is no need for look-ups in this table.<br><br>The reference implementation also modifies the distribution protocol to<br>allow references with up to five 32-bit values. For backwards compatibility<br>reasons this modification of the distribution protocol cannot be used at once<br>when aliases are introduced. This since we need to be able to communicate with<br>older nodes from previous releases for a while. When this has been living in<br>the system for enough time (expected to be OTP 26) we can begin sending<br>references with up to five 32-bit words and remove the usage of the table<br>mapping references over the distribution to aliases. That is, it is not until<br>this happens that the alias implementation is fully complete.<br><br>Why is it not possible to get the PID of the process that an alias refers to?<br>-----------------------------------------------------------------------------<br><br>Most importantly there is no need to know the PID of the process that an<br>alias refers to in order to solve the problems that alias are intended<br>to solve. The user is expected to utilize alias in a protocol where one knows<br>whether a reference is an alias or not and should not need to know the PID of<br>the process that it refers to.<br><br>Besides the above there are also other issues with such functionality. The<br>content of a reference is just a large integer. In order to keep distribution<br>transparency one would either have to specify how this integer should be<br>interpreted or require synchronous signaling with the node where the<br>identified process resides. The synchronous signal-ling will be very<br>expensive. By specifying how the reference integer should be interpreted we<br>would prevent future changes to how the integer of the reference should be<br>interpreted which might prevent future optimizations, improvements and new<br>features. Up until the time when large references with five 32-bit words can<br>be passed over the distribution, synchronous communication is also the only<br>option on how to implement such functionality.<br><br>If we should mimic the `whereis()` function of the registered name API where<br>you also can see if a name is currently registered, no other option than<br>synchronous signaling with the process identified by the alias is possible.<br><br>Why is it not possible to test if a reference is an alias?<br>----------------------------------------------------------<br><br>The same reason as to why it is not possible to get the PID of the<br>process that is referred to by an alias.<br><br>Why not allow registration of arbitrary Erlang terms instead?<br>-------------------------------------------------------------<br><br>Such a feature could solve the same issue that aliases are intended to<br>solve, but there are problems with such an approach.<br><br>Terms other than pids, ports, and references do not have a node identifier<br>embedded into the data type. For such data types you need some other way<br>to identify the node of where the name is registered. In the current case<br>of atoms as registered names, this is done by wrapping the name in a<br>two-tuple that contains the node name. Something like this is needed for<br>all other terms than just plain pids, ports, and references. This also<br>introduce a problem. Is a two-tuple just a name or a name plus a node<br>identifier?<br><br>Should it be possible to register a PID as a name for another process?<br>This would force all send operations to first lookup the PID in the<br>table of registered names before performing the operation. This will<br>cost performance in all send operations. The same is true for ports.<br><br>We don't think registration of arbitrary terms should be implemented<br>due to the problems that arise. Current registration feature that only<br>allows atoms can however be a bit too limiting when you need to register<br>a number of processes for the same service. An option could be to allow<br>registration of two-tuples containing an atom and an integer. Perhaps<br>other terms such as strings should also be allowed, but arbitrary terms<br>should not be allowed.<br><br>Allowing references as registered names implies scalability bottlenecks<br>not present in the alias API. That is, this would be an inferior solution<br>to the problem we set out to solve.<br><br>One probably wants to extend name registration with more allowed terms<br>than just atoms, but this for solving other problems than what aliases<br>are intended to solve. The name registration API does not fit aliases<br>so we don't see that aliases should be combined with such an extension<br>of the registration API. The alias solution solves the problem we set out<br>to solve, so this eep is limited to that.<br><br>Why is the tag option of monitor/3 introduced?<br>----------------------------------------------<br><br>When using the monitor option `alias` in a `spawn_request()` call you<br>get unnecessary delays since you cannot share the alias with the<br>child process until you have gotten the spawn reply with the process<br>identifier of the child process. You instead typically want to<br>explicitly create the alias before the `spawn_request()` call and pass<br>it as an argument to the child process.<br><br>In a typical scenario you want to receive a response or an error<br>of the operation. However, if you explicitly create an alias before<br>the `spawn_request()` operation, the monitor reference and the alias<br>will be different references. This will prevent the compiler from<br>optimizing the receive (to skip messages present in the message queue<br>when the reference was created) since not all receive clauses will<br>match on the same reference.<br><br>We solve this by using the `tag` monitor option as well as the<br>`reply_tag` spawn request. The following is a fully functional rpc<br>implementation using this method on a system with the prototype<br>implementation of aliases:<br><br>    rpc(Node, M, F, A) -><br>        Alias = alias([reply]),<br>        ReqId = spawn_request(Node,<br>                              fun () -><br>                                      Result = apply(M, F, A),<br>                                      Alias ! {{result, Alias}, Result}<br>                              end,<br>                              [{monitor, [{tag, {'DOWN', Alias}}]},<br>                               {reply_tag, {spawn_reply, Alias}},<br>                               {reply, error_only}]),<br>        receive<br>            {{result, Alias}, Result} -><br>                demonitor(ReqId, [flush]),<br>                Result;<br>            {{'DOWN', Alias}, ReqId, process, _, Error} -><br>                rpc_error_cleanup(Alias, Error);<br>            {{spawn_reply, Alias}, ReqId, error, Error} -><br>                rpc_error_cleanup(Alias, Error)<br>        end.<br><br>    rpc_error_cleanup(Alias, Error) -><br>        case unalias(Alias) of<br>            true -><br>                %% No flush needed since we used the 'reply' option<br>                %% to alias(), and the alias was still active...<br>                error({rpc_error, Error});<br>            false -><br>                %% Flush a possible result message...<br>                receive {{result, Alias}, Result} -> Result<br>                after 0 -> error({rpc_error, Error})<br>                end<br>        end.<br><br>The `tag` monitor option can be used in other situations as<br>well in order to get a single reference that is present in<br>all types of responses from a group of processes. The processes<br>may be pre-existing or not. This reference can then be utilized<br>to determine if a message corresponds to a specific operation<br>made to a specific group of processes.<br><br>There are plans to extend the receive optimization so that multiple<br>receives matching on the same reference in all clauses can utilize<br>the optimization. This will also improve performance for such<br>implementations receiving multiple messages matching on the same<br>reference.<br><br>The tag to use in the monitor message is stored locally in the<br>process that sets up the monitor and does not have to be<br>communicated between processes. Most importantly it does not<br>have to be sent over the wire in the distributed case. This also<br>means that it can also be used when monitoring processes on older<br>nodes which does not support this functionality.<br><br>Backwards Compatibility<br>=======================<br><br>The alias feature is a pure extension, so there are no real backwards<br>compatibility issues.<br><br>In order to be able to communicate aliases over Erlang nodes from<br>previous releases we cannot pass large references over the distribution<br>and therefore need to keep information about aliases in a node global<br>table. The implementation benefits from being able to pass larger<br>references over the distribution, but will not do so until we can make<br>it mandatory to be able to handle such large references. Both OTP 24<br>and OTP 25 will be able to handle large references over the distribution<br>and since we only guarantee distribution compatibility with the two<br>closest releases backwards and forwards we can then make large<br>references mandatory in OTP 26.<br><br>This node global table for alias introduce an overhead when utilizing<br>aliases compared to sending using the PID of the process. This due<br>to allocation and manipulation of table structures. Comparing to the<br>existing solution of utilizing a proxy process in order to<br>prevent stray messages the overhead of this node global table for<br>aliases is small. Fortunately this node global table also only need to<br>be present temporarily and can be removed in OTP 26.<br><br>Reference Implementation<br>========================<br><br>The reference implementation is provided by<br>[pull request #2735](<a href="https://github.com/erlang/otp/pull/2735">https://github.com/erlang/otp/pull/2735</a>).<br><br>Beside implementation of the alias feature. The pull request also contain<br>usage of aliases in the gen behaviors such as gen_server. Due to this it is<br>now also possible to implement `receive_response()` functionality similar to<br>`erpc:receive_response()` which also have been implemented:<br><br>* `gen_server:receive_response/2`<br>* `gen_statem:receive_response/2`<br>* `gen_event:receive_response/2`<br><br>Change Log<br>==========<br><br>* 2020-10-29: The `tag` monitor option was introduced.<br>* 2020-11-12: The `once` option of `alias/1` was changed to<br>              `reply`. The `unalias` option of `alias/1` was<br>              changed to `explicit_unalias`. In the `UnaliasOpt`<br>              part of the `alias` monitor option, `unalias`<br>              was changed to `explicit_unalias`.<br>* 2020-11-12: The state of the EEP was changed to Accepted.<br><br>[EmacsVar]: <> "Local Variables:"<br>[EmacsVar]: <> "mode: indented-text"<br>[EmacsVar]: <> "indent-tabs-mode: nil"<br>[EmacsVar]: <> "sentence-end-double-space: t"<br>[EmacsVar]: <> "fill-column: 70"<br>[EmacsVar]: <> "coding: utf-8"<br>[EmacsVar]: <> "End:"<br>[VimVar]: <> "vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4:"<br></div><div><br></div><div><br></div></div>