[erlang-questions] term_to_binary and record improvements

Fri Aug 29 04:57:50 CEST 2008

On 29 Aug 2008, at 6:47 am, Joe Armstrong wrote:
>   Suppose the run-time representation of the above record was
> {person, [name,age], "fred", 30} - if this were the case then the
> fields of the tuple would be self-describing and we could let the
> compiler turn X.name into a function call lookup(X, name).

In my "frames" proposal, the representation is _effectively_
{{person,age,name}, 30, "fred"}
with {person,age,name} shared.
This means that the size of a "frame" is the same as the
size of the corresponding "record", provided that the
"descriptor" {person,age,name} is shared, as it usually can be.

I note that the external term format is half-way to what we
want:  it does have provision for 'caching' things and referring
back to them later, only this is limited to atoms.

There are at least three possibilities:

(1) Don't preserve any sharing at all, other than atoms.
     [Present situation.]

(2) Build a hash table based on the *identity* (= address) of
     terms in a first pass, and use that in a second pass.
     [This preserves existing sharing.]

(3) Build a hash table based on the *equality* (= values) of
     terms in a first pass, and use that in a second pass.
     [This may introduce new sharing.]

I note that UBF also allows this kind of compression, which will
obviously not surprise Joe!  I've long wondered why it wasn't
done; records or no records it looks like a good idea.

Question:  in existing Erlang use, which is more important,
speed of generating a binary encoding for a term, or how big
it is (which relates to how quickly it can be sent across a
network and how fast it can be decoded, amongst other things).