term_to_binary/2 with atom cache and/or pid_info/1

Tue Mar 23 15:46:19 CET 2021

On Mon, Mar 22, 2021 at 12:08 PM Loïc Hoguin <lhoguin@REDACTED> wrote:

> Hello,
>
> Currently the Erlang Term Format has two variants:
>
>  * the full featured format that includes different forms of atom caches
>
>  * the simpler term_to_binary/1 format that does not
>
> This is not a satisfying state of affairs: sometimes we want to use
> term_to_binary/1 for protocols or when exchanging data, but the lack
> of atom cache can result in us sending a lot of 'undefined' atoms in
> string form.
>
>   => Should term_to_binary/1 allow setting up an atom cache?
>      Perhaps the cache could be maintained as a map to be encoded
>      separately by the user. This could also allow predefining
>      the most common atoms that could then never be sent (for
>      example #{undefined => 1, true => 2, false => 3}). Whatever
>      the interface we should reuse as much of the distribution
>      header atom cache code as possible.
>
> An alternative would be to build our own format loosely based on the
> Erlang Term Format. But in that scenario we end up lacking at least
> the pid_info/1 and ref_info/1 functions that would allow us to encode
> a pid/reference without having to use either term_to_binary/1 or
> {pid,ref}_to_list/1. On the other side the pid/reference can be
> recomposed via a pid_from_info/1 or ref_from_info/1 type of function.
>
> These functions can be useful to have regardless of the answer to the
> first question above. For example pid_info/1 is used in Mnesia here:
>
>
> https://github.com/erlang/otp/blob/master/lib/mnesia/src/mnesia_locker.erl#L1270
>
> And also in RabbitMQ here, as well as pid_from_info/1:
>
>
> https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/pid_recomposition.erl
>
> I've also been writing similar code when experimenting with custom
> distribution drivers.
>
>   => Should erlang:pid_info/1 and erlang:pid_from_info/1 be added?
>      This is the strongest case as there's code in the wild
>      already doing this.
>
>   => Should erlang:ref_info/1 and erlang:ref_from_info/1 be added?
>
> It's possible that ports and funs may benefit as well, but I have
> a hard time figuring out when we would want to use a port that
> way, and funs I believe that we already have everything we need
> as long as they're not anonymous funs.
>
> Cheers,
>
> --
> Loïc Hoguin
>
>
It is a bit unfortunate that the "creation" value of the node part is so
well hidden since the full identifier of a node is its nodename together
with its creation. It would have been nice if the node/1 BIF had returned
'{Nodename, Creation}' instead of just 'Nodename', but that is too late to
change now. Perhaps a nid/1 BIF?

Currently pids, ports and references are the datatypes that contain node
identifiers which also are the types the node/0 BIF can handle.

I think it is reasonable with functionality for creation of such data types
from full information, so that alternative protocols wont have to go via
the external term format.

Regards,
Rickard
-- 
Rickard Green, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210323/ea968699/attachment.htm>