[erlang-questions] [ANN] LETS - LevelDB-based Erlang Term Storage v0.5.3

Wed Nov 23 03:32:33 CET 2011

On Wed, Nov 23, 2011 at 01:30, Ciprian Dorin Craciun
<ciprian.craciun@REDACTED> wrote:
> On Mon, Nov 21, 2011 at 13:59, Joseph Norton <norton@REDACTED> wrote:
>>
>> Ciprian -
>>
>> For encoding and then returning keys and values to the Erlang virtual machine, leveldb::Slice() is sufficient.  I didn't run any comparison tests but my assumption is that using std::string() requires (should require?) an extra memory allocation, copy, and deallocation for the std:string() object itself.
>>
>> Joe N.

    Small observation: I've quickly hacked the LevelDB implementation
so that it takes `Slice` as a value (all the way down to the core
where it is actually copied to) thus I've saved the `std::string`
allocation.

    But in my preliminary benchmarks, I see no difference in
performance. (Maybe I'm throttled by the test harness...)

    Ciprian.


>    Hy again!
>
>    So on my Go bindings I've done a small benchmark: implementing
> `get` in terms of the `Get(Slice&,std::string*)` or in terms of
> `NewIterator() / Seek(Slice&) / Compare(Slice&)`, and I've got quite
> some interesting results:
>    * on small sets (100k) it seems that if the key exists there is no
> sensible performance difference;
>    * but on large sets (1m) the impact is about 2x;
>    * and it gets worse when the key does not exist; (the performance
> drops to about a couple of ops per second...)
>
>    My experiment was as follows:
>    * step1) put 1m pairs composed of little endian unsigned 64 bit
> key / value (the key is from 0 to 1m, and the value is key squared);
>    * step2) get 1m pairs;
>    * step3) delete those which `key & pattern == 0`;
>    * step4) re-get 1m and verify if they should exist and what they hold
>    * I use little endian to mix the keys a little bit;
>    * I do not reopen the database between the four steps; I do reopen
> the database for each experiment;
>    * all experiments are done over tmpfs (without swapping) and each
> experiment starts with a fresh database;
>    * the values are computed as dividing the total number of
> operations with the overall time; (the actual speed varies over time
> as result of the workload pattern...)
>    * in the case of the re/get experiment I don't let it run more
> than 20 seconds;
>    * (take into account that the benchmark is "driven" by Go and it
> has some overhead, but the Go call path is identical in both setups,
> thus it doesn't influence the trend;)
>    * the delete speed varies as I only count a delete when I do it,
> but I still need to go through the entire key range;
>
>    Results:
> ~~~~
> # get as `NewIterator()/Seek()/Compare()`
> del-pattern | put/s | get/s | del/s | reget/s
> 0x00 -- all |   47k |   17k |   34k |       4
> 0x10 -- 50% |   50k |   17k |   36k |     200
> 0x70 -- 14% |   48k |   17k |   38k |     762
> 0xf0 --  6% |   50k |   17k |   46k |    1928
> 0xf... none |   50k |   17k |   --  |   17k
> ~~~~
> # get as `Get(Slice&,std::string*)`
> del-pattern | put/s | get/s | del/s | reget/s
> 0x00 -- all |   48k |   44k |   42k |   47k
> 0x10 -- 50% |   49k |   43k |   64k |   48k
> 0x70 -- 14% |   38k |   43k |   60k |   42k
> 0xf0 --  6% |   49k |   42k |   31k |   45k
> 0xf... none |   37k |   43k |   --  |   43k
> ~~~~
>
>    Hope you find it useful,
>    Ciprian.
>