term_to_binary/2 with atom cache and/or pid_info/1

Tue Mar 23 15:48:48 CET 2021

On Tue, Mar 23, 2021 at 3:46 PM Rickard Green <rickard@REDACTED> wrote:

> On Mon, Mar 22, 2021 at 12:08 PM Loïc Hoguin <lhoguin@REDACTED> wrote:
>
>> Hello,
>>
>> Currently the Erlang Term Format has two variants:
>>
>>  * the full featured format that includes different forms of atom caches
>>
>>  * the simpler term_to_binary/1 format that does not
>>
>> This is not a satisfying state of affairs: sometimes we want to use
>> term_to_binary/1 for protocols or when exchanging data, but the lack
>> of atom cache can result in us sending a lot of 'undefined' atoms in
>> string form.
>>
>>   => Should term_to_binary/1 allow setting up an atom cache?
>>      Perhaps the cache could be maintained as a map to be encoded
>>      separately by the user. This could also allow predefining
>>      the most common atoms that could then never be sent (for
>>      example #{undefined => 1, true => 2, false => 3}). Whatever
>>      the interface we should reuse as much of the distribution
>>      header atom cache code as possible.
>>
>> An alternative would be to build our own format loosely based on the
>> Erlang Term Format. But in that scenario we end up lacking at least
>> the pid_info/1 and ref_info/1 functions that would allow us to encode
>> a pid/reference without having to use either term_to_binary/1 or
>> {pid,ref}_to_list/1. On the other side the pid/reference can be
>> recomposed via a pid_from_info/1 or ref_from_info/1 type of function.
>>
>> These functions can be useful to have regardless of the answer to the
>> first question above. For example pid_info/1 is used in Mnesia here:
>>
>>
>> https://github.com/erlang/otp/blob/master/lib/mnesia/src/mnesia_locker.erl#L1270
>>
>> And also in RabbitMQ here, as well as pid_from_info/1:
>>
>>
>> https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/pid_recomposition.erl
>>
>> I've also been writing similar code when experimenting with custom
>> distribution drivers.
>>
>>   => Should erlang:pid_info/1 and erlang:pid_from_info/1 be added?
>>      This is the strongest case as there's code in the wild
>>      already doing this.
>>
>>   => Should erlang:ref_info/1 and erlang:ref_from_info/1 be added?
>>
>> It's possible that ports and funs may benefit as well, but I have
>> a hard time figuring out when we would want to use a port that
>> way, and funs I believe that we already have everything we need
>> as long as they're not anonymous funs.
>>
>> Cheers,
>>
>> --
>> Loïc Hoguin
>>
>>
> It is a bit unfortunate that the "creation" value of the node part is so
> well hidden since the full identifier of a node is its nodename together
> with its creation. It would have been nice if the node/1 BIF had returned
> '{Nodename, Creation}' instead of just 'Nodename', but that is too late to
> change now. Perhaps a nid/1 BIF?
>
> Currently pids, ports and references are the datatypes that contain node
> identifiers which also are the types the node/0 BIF can handle.
>
> I think it is reasonable with functionality for creation of such data
> types from full information, so that alternative protocols wont have to go
> via the external term format.
>
> Regards,
> Rickard
> --
> Rickard Green, Erlang/OTP, Ericsson AB
>

> the types the node/0 BIF can handle

should have been: "the types the node/1 BIF can handle"

-- 
Rickard Green, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210323/c347a6a5/attachment.htm>