[erlang-patches] Improving ETS performances

Francesco Lattanzio franz.lattanzio@REDACTED
Wed Jun 25 21:38:06 CEST 2014


Hello again Sverker,
    I've made some progress with this patch. Briefly:

* set tables and bag/duplicate_bag tables are now managed by dedicated
  code -- erl_db_hash.[ch] and erl_db_nested_hash.[ch] respectively
* reduced memory overhead

You can compute the difference (in words) between the new and the old
implementations' memory footprints (when nested LHT is not active) of a
collection of 'n' objects sharing the same key as:

  MemDiff = 4 - n

I used this formula to fix the ets suite's memory test case.

With regard to the yielding problem, I deciced not to fix it now for the
following reasons:

1) it is unrelated to the nested LHT, as the problem manifests itself
   even in the current (non-nested LHT) implementation
2) you offered to fix it yourself

The patch is available at:

  git fetch https://github.com/fltt/otp.git ets_nested_lht_3

  https://github.com/fltt/otp/compare/erlang:maint...ets_nested_lht_3
  https://github.com/fltt/otp/compare/erlang:maint...ets_nested_lht_3.patch

It is based on the maint branch.

I've tested it on Linux x86_64, latest kernel (all the test suites) and
FreeBSD/i386 9.2 (only the emulator, kernel and stdlib test suites).

Have another nice day.

On Fri, Nov 29, 2013 at 04:22:30PM +0100, Sverker Eriksson wrote:
> Hi
> 
> This patch has now been discussed by OTP technical board.
> 
> It's a desirable feature but the patch is not acceptable as-is,
> mainly due to too large memory overhead per object.
> 
> We would need
> 
> * No extra overhead for table type 'set'.
> * Minimized overhead per object for 'bag' and 'duplicate_bag',
>    especially when the nested hash feature is not used.
> 
> Another weakness is the lack of yielding when long key chains are
> traversed. BIFs should not execute too long but rather yield by
> returning back to the emulator loop to be continued later by the
> scheduler (as is done by the ets:select operations).
> The patch removes this problem for unrelated keys but it still arise
> at lookup/delete on a lot of objects with same key.
> If an otherwise accepted patch is supplied we can undertake the job
> to fix the yielding. It can be good to have in mind though while doing
> the design.
> 
> 
> 
> Some tips and tricks to minimize memory overhead:
> (some of them are probably mutually exclusive)
> 
> * Use lower bits in pointers as flags.
> * Combine next and kprevious as only one of them is used.
> * Move nkitems and lht into a separate root struct that is allocated
>    when the first duplicate key is inserted.
> * Move knext and kprevious to NestedDbTerm to only allocate them when 
> needed.
> * Use really dirty trick to allocate a mutilated struct with an unused 
> prologue.
> 
> 
> typedef struct {
>      HashDbTerm* previous;  /* NOT ACCESSABLE FOR SET */
>      HashDbTerm* next;
>      HashValue  hvalue;
>      DbTerm dbterm;
> }HashDbTerm;
> 
> /* For bags */
> ptr = malloc(sizeof(HashDbTerm));
> 
> /* For sets */
> ptr = malloc(sizeof(HashDbTerm) - offsetof(HashDbTerm,next));
> ptr = (char*)ptr - offsetof(HashDbTerm,next);
> 
> ptr->next = ...;
> ptr->hvalue = ...;
> ptr->dbterm = ...;
> 
> ptr->previous = NULL; /* MEMORY CORRUPTION!!! */
> 
> free((char*)ptr + offsetof(HashDbTerm,next));
> 
> 
> Maybe there are more sensible ways to accomplish the same thing.

-- 
Francesco Lattanzio



More information about the erlang-patches mailing list