[erlang-questions] sext and Tokyo Tyrant (Re: [erlang-questions] sortable serialization format)

Paul Mineiro <>
Sat Oct 31 17:18:36 CET 2009


Awesome.  That's 90% of tcerl; most of the complexity had to do with
implementing erlang term order from encoded binaries, some of the rest
with augmenting term_to_binary to allow for prefixes (there's no smallest
or largest term in erlang), and the remainder with query planning.  The
query planning part is pure erlang and independent of the term encoding
(http://code.google.com/p/tcerl/source/browse/trunk/tcerl/src/tcbdbmsutil.erl),
so you could hopefully reuse it.

In retrospect, clearly, I should have abandoned term_to_binary, if only
because Erlang is easy and C is hard, so the C side should be just memcmp.

-- p

On Sat, 31 Oct 2009, Ulf Wiger wrote:

> Ulf Wiger wrote:
> >
> > A while ago I started hacking on a serialization format that
> > would have the same sorting properties as Erlang terms.
> >
> > I didn't quite get it to work (negative floats was the most
> > difficult part), but when I returned to it today, I realized
> > that it was only a very small problem. Once fixed, all my
> > QuickCheck suites passed.
>
> I just had to try this on Tokyo Tyrant, so I wrote a small
> prototype for connecting to TT and encoding a few requests,
> using the sext library to encode terms before sending them.
>
> I realized that a new function was needed in sext: prefix(Term),
> which encodes a 'prefix' that will match similar terms, and allow
> some wildcarding. A prefix can't be decoded (at least, I didn't
> write any code for doing so).
>
> Some examples:
>
> Eshell V5.7.1  (abort with ^G)
> 1> sext:encode({1,2,3}).
> <<16,0,0,0,3,10,0,0,0,2,10,0,0,0,4,10,0,0,0,6>>
> 2> sext:prefix({1,'_','_'}).
> <<16,0,0,0,3,10,0,0,0,2>>
> 3> sext:encode([1,2,3]).
> <<17,10,0,0,0,2,10,0,0,0,4,10,0,0,0,6,0>>
> 4> sext:prefix([1,2|'_']).
> <<17,10,0,0,0,2,10,0,0,0,4>>
>
>
> Armed with this, I opened a B-tree table in Tokyo Tyrant,
> and connected to it with my prototype module.
>
> Eshell V5.7.1  (abort with ^G)
> 1> {ok,TT} = tt_proto:open(tt,[]).
> {ok,<0.35.0>}
> 2> tt_proto:put(TT,{1,a}, 1).
> ok
> 3> tt_proto:get(TT,{1,a}).
> {ok,1}
> 4> tt_proto:put(TT,{1,b}, 2).
> ok
> 5> tt_proto:put(TT,{1,c}, 3).
> ok
> 6> tt_proto:put(TT,{2,a}, 4).
> ok
>
> Now, for some prefix matching:
>
> 7> tt_proto:keys(TT,{1,'_'}).
> {ok,[{1,a},{1,b},{1,c}]}
> 8> tt_proto:keys(TT,{2,'_'}).
> {ok,[{2,a}]}
> 9> timer:tc(tt_proto,keys,[TT,{1,'_'}]).
> {279,{ok,[{1,a},{1,b},{1,c}]}}
>
> I made no real effort to optimize anything. The module starts
> a gen_server which keeps a connection open to ttserver. It handles
> only one query at a time, but looking at the TCP protocol, it's
> hard to see how it could to otherwise, as there is no tagging of
> requests. The round trip times are going to be fairly high for simple
> requests (compared to dets and mnesia on small data sets), but the
> main benefit of using TT in the first place ought to be either that
> the data set is uncomfortably large for mnesia and dets, or that one
> wants ordered_set semantics on disk-based storage.
>
> I put the tt_proto module in sext/examples/
> There is some edoc for it too.
>
> http://svn.ulf.wiger.net/sext/trunk/sext/doc/index.html
>
> BR,
> Ulf W
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>



More information about the erlang-questions mailing list