[erlang-questions] term_to_binary and record improvements

Fri Aug 29 09:35:44 CEST 2008

On Fri, Aug 29, 2008 at 4:57 AM, Richard A. O'Keefe <ok@REDACTED> wrote:
>
> On 29 Aug 2008, at 6:47 am, Joe Armstrong wrote:
>>
>>  Suppose the run-time representation of the above record was
>> {person, [name,age], "fred", 30} - if this were the case then the
>> fields of the tuple would be self-describing and we could let the
>> compiler turn X.name into a function call lookup(X, name).
>
> In my "frames" proposal, the representation is _effectively_
> {{person,age,name}, 30, "fred"}
> with {person,age,name} shared.
> This means that the size of a "frame" is the same as the
> size of the corresponding "record", provided that the
> "descriptor" {person,age,name} is shared, as it usually can be.
>
> I note that the external term format is half-way to what we
> want:  it does have provision for 'caching' things and referring
> back to them later, only this is limited to atoms.
>
> There are at least three possibilities:
>
> (1) Don't preserve any sharing at all, other than atoms.
>    [Present situation.]
>
> (2) Build a hash table based on the *identity* (= address) of
>    terms in a first pass, and use that in a second pass.
>    [This preserves existing sharing.]

But you don't need to *internally* a tuple {Big,Big,Big} is a pointer to
a tuple on heap which is four words, the first word is {arity,3}, then
the next three words
are identical pointers to Big. The code for tuple_to_list should
therefore look very much
like the code for garbing a process heap onto a new heap.

Just for fun I'll write term_to_binary  in Erlang and also make a new
version that shares
(in Erlang) - unless somebody has already written this

>
> (3) Build a hash table based on the *equality* (= values) of
>    terms in a first pass, and use that in a second pass.
>    [This may introduce new sharing.]
>
> I note that UBF also allows this kind of compression, which will
> obviously not surprise Joe!  I've long wondered why it wasn't
> done; records or no records it looks like a good idea.

Actually we could use UBF as the external format!

For compatibility we could write term_to_new_binary and the inverse

> Question:  in existing Erlang use, which is more important,
> speed of generating a binary encoding for a term, or how big
> it is (which relates to how quickly it can be sent across a
> network and how fast it can be decoded, amongst other things).

I'm think we can make it smaller and faster ...

/Joe

>
>
>
>
>