term_to_binary/2 with atom cache and/or pid_info/1

Mon Mar 22 12:08:07 CET 2021

Hello,

Currently the Erlang Term Format has two variants:

 * the full featured format that includes different forms of atom caches

 * the simpler term_to_binary/1 format that does not

This is not a satisfying state of affairs: sometimes we want to use
term_to_binary/1 for protocols or when exchanging data, but the lack
of atom cache can result in us sending a lot of 'undefined' atoms in
string form.

  => Should term_to_binary/1 allow setting up an atom cache?
     Perhaps the cache could be maintained as a map to be encoded
     separately by the user. This could also allow predefining
     the most common atoms that could then never be sent (for
     example #{undefined => 1, true => 2, false => 3}). Whatever
     the interface we should reuse as much of the distribution
     header atom cache code as possible.

An alternative would be to build our own format loosely based on the
Erlang Term Format. But in that scenario we end up lacking at least
the pid_info/1 and ref_info/1 functions that would allow us to encode
a pid/reference without having to use either term_to_binary/1 or
{pid,ref}_to_list/1. On the other side the pid/reference can be
recomposed via a pid_from_info/1 or ref_from_info/1 type of function.

These functions can be useful to have regardless of the answer to the
first question above. For example pid_info/1 is used in Mnesia here:

 https://github.com/erlang/otp/blob/master/lib/mnesia/src/mnesia_locker.erl#L1270

And also in RabbitMQ here, as well as pid_from_info/1:

 https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/pid_recomposition.erl

I've also been writing similar code when experimenting with custom
distribution drivers.

  => Should erlang:pid_info/1 and erlang:pid_from_info/1 be added?
     This is the strongest case as there's code in the wild
     already doing this.

  => Should erlang:ref_info/1 and erlang:ref_from_info/1 be added?

It's possible that ports and funs may benefit as well, but I have
a hard time figuring out when we would want to use a port that
way, and funs I believe that we already have everything we need
as long as they're not anonymous funs.

Cheers,

-- 
Loïc Hoguin