[erlang-questions] Extending term external format to support shared substructures

Bjorn Gustavsson bgustavsson@REDACTED
Tue Mar 31 11:46:12 CEST 2009


On Tue, Mar 31, 2009 at 6:37 AM, Matthew Dempsky <matthew@REDACTED> wrote:
> On Mon, Mar 30, 2009 at 6:19 PM, Matthew Dempsky <matthew@REDACTED> wrote:
>> Unless anyone is strongly opposed to the idea, I'll work on a
>> proof-of-concept patch.
>
> Applying the patch below to R13A extends binary_to_term to support the
> 'D' and 'w' type tags as I described above.  For example:
>
> 1> binary_to_term(<<131,$D,3:32, $k,5:16,"hello", $k,5:16,"world",
> $h,2,$w,0:32,$w,1:32, $l,4:32,$w,2:32,$w,2:32,$w,2:32,$w,2:32,$j>>).
> [{"hello","world"},
>  {"hello","world"},
>  {"hello","world"},
>  {"hello","world"}]
>
> (This example uses a three element dictionary: the strings "hello" and
> "world" are the first two words, the tuple {"hello", "world"} is the
> third, using references to the dictionary instances; finally, the term
> value is a length-4 list using 4 references to the dictionary tuple.)

Thanks for the patch. Even though it looks fine, we will not include until
the code for encoding a term has been written.

> The patch isn't entirely minimal; it also fixes a decoding problem for
> zero length LIST_EXT structures, avoids allocating extra heap cells
> for list structures, and refactors some of the integer unpacking to
> use get_int16 and get_int32.

Thanks for pointing out those issues.

I will include the use of the macros and the correction of the number of
heap words needed for lists in R13B.

Your fix for a zero length LIST_EXT doesn't seem to be correct, though.
Have you tried:

binary_to_term(<<131,104,2,108,0,0,0,0,106,100,0,1,97>>).

My correction for this problem will appear in the next R13B
snapshot (hopefully tomorrow) at:

http://www.erlang.org/download/snapshots/

(It is also good idea to use a snapshot as a base for further patches,
as I have eliminated the deep recursion in term_to_binary/1.)

> On a related note, I don't really understand why decoded_size keeps a
> stack of values for 'terms'.  It seems like it should just be possible
> to keep a running grand-total rather than pushing and popping from a
> stack.
>

To make sure that all terms are properly nested. We try to do as much error
checking as possible while calculating the size.

/Bjorn
-- 
Björn Gustavsson, Erlang/OTP, Ericsson AB



More information about the erlang-questions mailing list