[erlang-questions] Wanted additions to the maps module?
Grzegorz Junka
list1@REDACTED
Tue May 10 15:39:57 CEST 2016
Thanks for the comments Björn-Egil. I did consider the concerns that you
mentioned, but I wanted to take a pragmatic approach. I was more
interested in practical implications of using a particular data
structure rather than its idealistic capabilities:
1. All test data used in all tests is generated only _once_. This is to
eliminate variations where specific structures work better with specific
inputs.
2. Tests are run in turns the specified amount of times and the output
is not averaged. This is to verify the reproducibility of the test for
the same input data.
3. Key and Value are both small integers. This is to minimize the
copying in ETS and to expose the overhead of calculating a hash in
hash-based data structures.
4. The data structure is pre-populated with some data before starting to
measure the time. This is to simulate longer-running processes that most
of the time aren't empty.
5. Readings of existing and non-existing keys is interleaved with
storing new keys. This is to reflect practical usage in typical
applications.
6. Each test is run in isolation, in a new process. This is to measure
the consumed memory most accurately - including all generated and not
collected garbage.
> Performance wise Maps usually beats ETS. Apples and Oranges however.
> ETS has other features so it's not a fair comparison.
Yeah, that's what my test concluded. Initially for small key/values maps
are slightly faster. The difference would only grow in favour of maps
for more complex payload because of the copying between ETS and the
process. But maps consume much more memory than ETS. I believe the
memory released when reallocating structures in ETS doesn't need to be
garbage collected since it's implemented in C outside of any process memory?
>> https://gist.github.com/amiramix/d43c9a73a6fe6d651d7f
>>
>> Maps are quite performant but process dictionary is still quicker
>> and maps are the worst when it comes to consumed memory, taking
>> twice as much as dict or process dictionary and over 5 times as
>> much memory as ets.
>>
>> Well, you are comparing apples and oranges. Process dictionary and
>> ETS are something completely different from gb_trees, dict, maps or
>> orddict.
What's the point of comparing gb_tress to gb_trees or maps to maps?
>>
>> I think you should look over how you measure memory. It seems very
>> arbitrary. What are you measuring? The size of the process after it
>> all? The gc can shrink or grow the heap during execution, trying to
>> find a suitable size. Also look at the number of gc:s during the
>> execution, this will indicate how garbage you are generating.
>> Normally benchmarks ends with an explicit garbage_collect before
>> looking at the size. This gives a more fair indication of memory
>> consumption.
>> It all depends on what you want to measure though, so look it over.
>>
>> Another approach would be to use erts_debug:flat_size/1. With the
>> functional data structures in this test you'll get:
>>
>> - dict: 67568
>> - gb_trees: 78668
>> - map_match: 58543
>> - list: 78665
>> - orddict: 78665
>>
>>
I am not measuring the "size of the process after it all". I am
measuring how much the size of an Erlang process grows after storing and
reading 30000 key/value pairs. This is to give me a better understanding
of how much memory I would potentially need to store (for example) a
billion of key-value data elements in one million of Erlang processes
(one thousand of such elements in each Erlang process).
Measuring the flat size maybe useful for developers implementing the
data structure. But does it tell me how much resources (CPU & memory)
the VM can consume, and thus how much hardware I would need to buy to
run such VM reliably?
I appreciate your comments so please let me know if there is anything
you would like to add.
Coming back to the iterator question, some data structures may be better
suitable for implementing such iterator than others. If for now the LUA
table could be implemented either in maps or dict, then maybe if say,
process dictionary supported iterators efficiently, they could well be
implemented using PD instead?
/Grzegorz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160510/fad57d26/attachment.htm>
More information about the erlang-questions
mailing list