[erlang-questions] term_to_binary and record improvements
Joe Armstrong
erlang@REDACTED
Fri Aug 29 09:35:44 CEST 2008
On Fri, Aug 29, 2008 at 4:57 AM, Richard A. O'Keefe <ok@REDACTED> wrote:
>
> On 29 Aug 2008, at 6:47 am, Joe Armstrong wrote:
>>
>> Suppose the run-time representation of the above record was
>> {person, [name,age], "fred", 30} - if this were the case then the
>> fields of the tuple would be self-describing and we could let the
>> compiler turn X.name into a function call lookup(X, name).
>
> In my "frames" proposal, the representation is _effectively_
> {{person,age,name}, 30, "fred"}
> with {person,age,name} shared.
> This means that the size of a "frame" is the same as the
> size of the corresponding "record", provided that the
> "descriptor" {person,age,name} is shared, as it usually can be.
>
> I note that the external term format is half-way to what we
> want: it does have provision for 'caching' things and referring
> back to them later, only this is limited to atoms.
>
> There are at least three possibilities:
>
> (1) Don't preserve any sharing at all, other than atoms.
> [Present situation.]
>
> (2) Build a hash table based on the *identity* (= address) of
> terms in a first pass, and use that in a second pass.
> [This preserves existing sharing.]
But you don't need to *internally* a tuple {Big,Big,Big} is a pointer to
a tuple on heap which is four words, the first word is {arity,3}, then
the next three words
are identical pointers to Big. The code for tuple_to_list should
therefore look very much
like the code for garbing a process heap onto a new heap.
Just for fun I'll write term_to_binary in Erlang and also make a new
version that shares
(in Erlang) - unless somebody has already written this
>
> (3) Build a hash table based on the *equality* (= values) of
> terms in a first pass, and use that in a second pass.
> [This may introduce new sharing.]
>
> I note that UBF also allows this kind of compression, which will
> obviously not surprise Joe! I've long wondered why it wasn't
> done; records or no records it looks like a good idea.
Actually we could use UBF as the external format!
For compatibility we could write term_to_new_binary and the inverse
> Question: in existing Erlang use, which is more important,
> speed of generating a binary encoding for a term, or how big
> it is (which relates to how quickly it can be sent across a
> network and how fast it can be decoded, amongst other things).
I'm think we can make it smaller and faster ...
/Joe
>
>
>
>
>
More information about the erlang-questions
mailing list