[erlang-questions] term_to_binary and record improvements
Richard A. O'Keefe
ok@REDACTED
Fri Aug 29 04:57:50 CEST 2008
On 29 Aug 2008, at 6:47 am, Joe Armstrong wrote:
> Suppose the run-time representation of the above record was
> {person, [name,age], "fred", 30} - if this were the case then the
> fields of the tuple would be self-describing and we could let the
> compiler turn X.name into a function call lookup(X, name).
In my "frames" proposal, the representation is _effectively_
{{person,age,name}, 30, "fred"}
with {person,age,name} shared.
This means that the size of a "frame" is the same as the
size of the corresponding "record", provided that the
"descriptor" {person,age,name} is shared, as it usually can be.
I note that the external term format is half-way to what we
want: it does have provision for 'caching' things and referring
back to them later, only this is limited to atoms.
There are at least three possibilities:
(1) Don't preserve any sharing at all, other than atoms.
[Present situation.]
(2) Build a hash table based on the *identity* (= address) of
terms in a first pass, and use that in a second pass.
[This preserves existing sharing.]
(3) Build a hash table based on the *equality* (= values) of
terms in a first pass, and use that in a second pass.
[This may introduce new sharing.]
I note that UBF also allows this kind of compression, which will
obviously not surprise Joe! I've long wondered why it wasn't
done; records or no records it looks like a good idea.
Question: in existing Erlang use, which is more important,
speed of generating a binary encoding for a term, or how big
it is (which relates to how quickly it can be sent across a
network and how fast it can be decoded, amongst other things).
More information about the erlang-questions
mailing list