[erlang-patches] Improving ETS performances
Francesco Lattanzio
franz.lattanzio@REDACTED
Wed Jun 25 21:38:06 CEST 2014
Hello again Sverker,
I've made some progress with this patch. Briefly:
* set tables and bag/duplicate_bag tables are now managed by dedicated
code -- erl_db_hash.[ch] and erl_db_nested_hash.[ch] respectively
* reduced memory overhead
You can compute the difference (in words) between the new and the old
implementations' memory footprints (when nested LHT is not active) of a
collection of 'n' objects sharing the same key as:
MemDiff = 4 - n
I used this formula to fix the ets suite's memory test case.
With regard to the yielding problem, I deciced not to fix it now for the
following reasons:
1) it is unrelated to the nested LHT, as the problem manifests itself
even in the current (non-nested LHT) implementation
2) you offered to fix it yourself
The patch is available at:
git fetch https://github.com/fltt/otp.git ets_nested_lht_3
https://github.com/fltt/otp/compare/erlang:maint...ets_nested_lht_3
https://github.com/fltt/otp/compare/erlang:maint...ets_nested_lht_3.patch
It is based on the maint branch.
I've tested it on Linux x86_64, latest kernel (all the test suites) and
FreeBSD/i386 9.2 (only the emulator, kernel and stdlib test suites).
Have another nice day.
On Fri, Nov 29, 2013 at 04:22:30PM +0100, Sverker Eriksson wrote:
> Hi
>
> This patch has now been discussed by OTP technical board.
>
> It's a desirable feature but the patch is not acceptable as-is,
> mainly due to too large memory overhead per object.
>
> We would need
>
> * No extra overhead for table type 'set'.
> * Minimized overhead per object for 'bag' and 'duplicate_bag',
> especially when the nested hash feature is not used.
>
> Another weakness is the lack of yielding when long key chains are
> traversed. BIFs should not execute too long but rather yield by
> returning back to the emulator loop to be continued later by the
> scheduler (as is done by the ets:select operations).
> The patch removes this problem for unrelated keys but it still arise
> at lookup/delete on a lot of objects with same key.
> If an otherwise accepted patch is supplied we can undertake the job
> to fix the yielding. It can be good to have in mind though while doing
> the design.
>
>
>
> Some tips and tricks to minimize memory overhead:
> (some of them are probably mutually exclusive)
>
> * Use lower bits in pointers as flags.
> * Combine next and kprevious as only one of them is used.
> * Move nkitems and lht into a separate root struct that is allocated
> when the first duplicate key is inserted.
> * Move knext and kprevious to NestedDbTerm to only allocate them when
> needed.
> * Use really dirty trick to allocate a mutilated struct with an unused
> prologue.
>
>
> typedef struct {
> HashDbTerm* previous; /* NOT ACCESSABLE FOR SET */
> HashDbTerm* next;
> HashValue hvalue;
> DbTerm dbterm;
> }HashDbTerm;
>
> /* For bags */
> ptr = malloc(sizeof(HashDbTerm));
>
> /* For sets */
> ptr = malloc(sizeof(HashDbTerm) - offsetof(HashDbTerm,next));
> ptr = (char*)ptr - offsetof(HashDbTerm,next);
>
> ptr->next = ...;
> ptr->hvalue = ...;
> ptr->dbterm = ...;
>
> ptr->previous = NULL; /* MEMORY CORRUPTION!!! */
>
> free((char*)ptr + offsetof(HashDbTerm,next));
>
>
> Maybe there are more sensible ways to accomplish the same thing.
--
Francesco Lattanzio
More information about the erlang-patches
mailing list