[erlang-questions] ETS memory fragmentation after deleting data

Led ledest@REDACTED
Mon Feb 18 14:41:59 CET 2019

Use binary:copy/1 to fill ets tables with fragments of binary stream.

чт, 7 лют. 2019 о 16:36 Dániel Szoboszlay <dszoboszlay@REDACTED> пише:

> Hi,
> I would like to understand some things about ETS memory fragmentation
> after deleting data. My current (probably faulty) mental model of the issue
> looks like this:
>    - For every object in an ETS table a block is allocated on a carrier
>    (typically a multi-block carrier, unless the object is huge).
>    - Besides the objects themselves, the ETS table obviously needs some
>    additional blocks too to describe the hash table data structure. The size
>    of this data shall be small compared to the object data however (since ETS
>    is not terribly space-inefficient), so I won't think about them any more.
>    - If I delete some objects from an ETS table, the corresponding blocks
>    are deallocated. However, the rest of the objects remain in their original
>    location, so the carriers cannot be deallocated (unless all of their
>    objects get deleted).
>    - This implies that deleting a lot of data from ETS tables would lead
>    to memory fragmentation.
>    - Since there's no way to force ETS to rearrange the objects it
>    already stores, the memory remains fragmented until subsequent updates to
>    ETS tables fill the gaps with new objects.
> I wrote a small test program (available here
> <https://gist.github.com/dszoboszlay/921b26a57463ec1f5df1816a840a78aa>)
> to verify my mental model. But it doesn't exactly behave as I expected.
>    1. I create an ETS table and populate it with 1M objects, where each
>    object is 1027 words large.
>    I expect the total ETS memory use to be around 1M * 1027 * 8 bytes ~
>    7835 MiB (the size of all other ETS tables on a newly started Erlang node
>    is negligible).
>    And indeed I see that the total block size is ~7881 MiB and the total
>    carrier size is ~7885 MiB (99.95% utilisation).
>    2. I delete 75% of the objects randomly.
>    I expect the block size to go down by ~75% and the carrier size with
>    some smaller value.
>    In practice however the block size goes down by 87%, while the carrier
>    size drops by 48% (resulting in a disappointing 25% utilisation).
>    3. Finally, I try to defragment the memory by overwriting each object
>    that was left in the table with itself.
>    I expect this operation to have no effect on the block size, but close
>    the gap between the block size and carrier size by compacting the blocks on
>    fewer carriers.
>    In practice however the block size goes up by 91%(!!!), while the
>    carrier size comes down very close to this new block size (utilisation is
>    back at 99.56%). All in all, compared to the initial state in step 1, both
>    block and carrier size is down by 75%.
> So here's the list of things I don't understand or know based on this
> exercise:
>    - How could the block size drop by 87% after deleting 75% of the data
>    in step 2?
>    - Why did overwriting each object with itself resulted in almost
>    doubling the block size?
>    - Would you consider running a select_replace to compact a table after
>    deletions safe in production? E.g. doing it on a Mnesia table that's
>    several GB-s in size and is actively used by Mnesia transactions. (I know
>    the replace is atomic on each object, but how would a long running replace
>    affect the execution time of other operations for example?)
>    - Step 3 helped to reclaim unused memory, but it almost doubled the
>    used memory (the block size). I don't know what caused this behaviour, but
>    is there an operation that would achieve the opposite effect? That is,
>    without altering the contents of the table reduce the block size by 45-50%?
> Thanks,
> Daniel
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190218/ac008755/attachment.htm>

More information about the erlang-questions mailing list