[erlang-questions] extracting sub-terms from term_to_binary encoded terms without unpacking first

Thu Nov 17 16:12:27 CET 2011

Hi Joe,

I also have many term_to_binary calls and as a result many-many binary data
chunks stored in DB.

As I understood to use your technique I need pimped versions of hd, tl and
element functions. Can you please provide a little bit more details about
it plase?

Best regards,
Max

On Thu, Nov 17, 2011 at 5:52 PM, Joe Armstrong <erlang@REDACTED> wrote:

> Here's a programing technique that might be useful which I haven't seen
> described before ...
>
> I've playing with unpacking binaries produced by term_to_binary(Term) in
> other languages. Specifically I do term_to_binary in Erlang creating
> binary and I send the
> binary to javascript. The javascript code does not by default decode the
> entire binary,
> but accesses sub-terms through selector functions (you only need element,
> hd and tl)
>
> This technique seems much nicer than mucking around with JSON
> binary formats are way easier to manipulate than than text formats that
> need parsing.
>
> Now of course you can do the same thing in Erlang, you do not have to
> do binary_to_term(B) to extract a sub-term, but can traverse the internal
> structure
> of the external format and pull out exactly what you want and nothing else.
>
> I often store large terms in files and databases using term_to_binary
> and I extract data by first doing binary_to_term and
> then pattern matching on the result.
>
> For example if I create a binary with:
>
>    > B = term_to_binary({foo,bar,[a,b]})
>
> And I want to extract the 'b' sub term, I'd normally write
>
>      {_, _, [_,X]} = binary_to_term(B)
>
> But why bother to unpack? I could just as well write
>
>      X = hd(tl(element(3,B)))
>
> This is not the regular hd/tl/and element but a hacked version that can
> traverse the external format.
>
> If the term inside the external format is large and if I only want to
> extract a few parameters
> then this method should be lot faster than actually building a large term,
> just to throw it away after pattern matching.
> This should be a  GC and cache friendly way of doing things.
>
> In a similar vein one could think of pattern matching being extended over
> packed terms.
>
> If this were so I could write:
>
>      T = {foo,bac,[a,b]}
>      B = term_to_binary(T),
>      match(B).
>
> match({_,_,[_,X]}) -> X
>
> Doing so would mean that once we have packed terms using term_to_binary we
> could leave them
> alone and extract data from them without having to completely unpack them.
>
> This should be very cache friendly - Erlang terms can be scatter all over
> the place in virtual memory
> but in the external form all the term is kept together in memory
>
> This is actually pretty useful - I have a data structure representing a
> book - somewhere near the beginning there is a title
> the entire book is stored on disk as a term_to_binary encoded blob. Now I
> have a large numbers of these
> representing ebooks. If I want to list all titles I certainly do not want
> to complete unpack everything,
> I only want to extract the title field and nothing else. ...
>
> Cheers
>
> /Joe
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111117/e07311a7/attachment.htm>