[erlang-questions] extracting sub-terms from term_to_binary encoded terms without unpacking first

Joe Armstrong erlang@REDACTED
Thu Nov 17 16:40:53 CET 2011

On Thu, Nov 17, 2011 at 4:12 PM, Max Bourinov <bourinov@REDACTED> wrote:

> Hi Joe,
> I also have many term_to_binary calls and as a result many-many binary
> data chunks stored in DB.
> As I understood to use your technique I need pimped versions of hd, tl and
> element functions. Can you please provide a little bit more details about
> it plase?

The external format is described in


To test my understanding of this I wrote a simple program that
reconstructs a term from the external format (enclosed)

A javascript program that does almost the same thing as my program is here:


As you can see decoding a term is easy.

The tuple {a,b,c} gets encoded as


131 means "external format"
194,3 means a tuple with 3 elements
100,1,97 means the atom a and so on

If you keep a pointer into this structure pointing at the second word (call
this P)
and want to implement element(3, P) then you just check that P points to a
then skip to the third element.

The external format is not actually designed for rapid random access but
it's not too
bad, so this pretty easy.

Turning the code that I posted here into a set of access functions is easy.
you only need element(K) - to step into tuples hd and tail
to step into lists (though n'th would be a good idea)

For this to be efficient you would need to do this as a NIF since the
erlang code
that does the same thing would create some garbage as it executes.

I haven't done any of this - my goal is to create binaries in Erlang
and decode them in javascript.



Best regards,
> Max
> On Thu, Nov 17, 2011 at 5:52 PM, Joe Armstrong <erlang@REDACTED> wrote:
>> Here's a programing technique that might be useful which I haven't seen
>> described before ...
>> I've playing with unpacking binaries produced by term_to_binary(Term) in
>> other languages. Specifically I do term_to_binary in Erlang creating
>> binary and I send the
>> binary to javascript. The javascript code does not by default decode the
>> entire binary,
>> but accesses sub-terms through selector functions (you only need element,
>> hd and tl)
>> This technique seems much nicer than mucking around with JSON
>> binary formats are way easier to manipulate than than text formats that
>> need parsing.
>> Now of course you can do the same thing in Erlang, you do not have to
>> do binary_to_term(B) to extract a sub-term, but can traverse the internal
>> structure
>> of the external format and pull out exactly what you want and nothing
>> else.
>> I often store large terms in files and databases using term_to_binary
>> and I extract data by first doing binary_to_term and
>> then pattern matching on the result.
>> For example if I create a binary with:
>>    > B = term_to_binary({foo,bar,[a,b]})
>> And I want to extract the 'b' sub term, I'd normally write
>>      {_, _, [_,X]} = binary_to_term(B)
>> But why bother to unpack? I could just as well write
>>      X = hd(tl(element(3,B)))
>> This is not the regular hd/tl/and element but a hacked version that can
>> traverse the external format.
>> If the term inside the external format is large and if I only want to
>> extract a few parameters
>> then this method should be lot faster than actually building a large
>> term, just to throw it away after pattern matching.
>> This should be a  GC and cache friendly way of doing things.
>> In a similar vein one could think of pattern matching being extended over
>> packed terms.
>> If this were so I could write:
>>      T = {foo,bac,[a,b]}
>>      B = term_to_binary(T),
>>      match(B).
>> match({_,_,[_,X]}) -> X
>> Doing so would mean that once we have packed terms using term_to_binary
>> we could leave them
>> alone and extract data from them without having to completely unpack them.
>> This should be very cache friendly - Erlang terms can be scatter all over
>> the place in virtual memory
>> but in the external form all the term is kept together in memory
>> This is actually pretty useful - I have a data structure representing a
>> book - somewhere near the beginning there is a title
>> the entire book is stored on disk as a term_to_binary encoded blob. Now I
>> have a large numbers of these
>> representing ebooks. If I want to list all titles I certainly do not want
>> to complete unpack everything,
>> I only want to extract the title field and nothing else. ...
>> Cheers
>> /Joe
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111117/8cfee985/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: decode_bin.erl
Type: text/x-erlang
Size: 2368 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111117/8cfee985/attachment.bin>

More information about the erlang-questions mailing list