<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Thanks for the comments Björn-Egil. I did consider the concerns
that you mentioned, but I wanted to take a pragmatic approach. I
was more interested in practical implications of using a
particular data structure rather than its idealistic capabilities:</p>
<p>1. All test data used in all tests is generated only _once_. This
is to eliminate variations where specific structures work better
with specific inputs.<br>
</p>
<p>2. Tests are run in turns the specified amount of times and the
output is not averaged. This is to verify the reproducibility of
the test for the same input data.</p>
<p>3. Key and Value are both small integers. This is to minimize the
copying in ETS and to expose the overhead of calculating a hash in
hash-based data structures.</p>
<p>4. The data structure is pre-populated with some data before
starting to measure the time. This is to simulate longer-running
processes that most of the time aren't empty.<br>
</p>
<p>5. Readings of existing and non-existing keys is interleaved with
storing new keys. This is to reflect practical usage in typical
applications.</p>
<p>6. Each test is run in isolation, in a new process. This is to
measure the consumed memory most accurately - including all
generated and not collected garbage.<br>
</p>
<br>
<br>
<blockquote
cite="mid:F57DB5D2-69BB-4844-B592-78DB212D5B64@ericsson.com"
type="cite">
<div id="AppleMailSignature">Performance wise Maps usually beats
ETS. Apples and Oranges however. ETS has other features so it's
not a fair comparison.<br>
</div>
</blockquote>
<br>
Yeah, that's what my test concluded. Initially for small key/values
maps are slightly faster. The difference would only grow in favour
of maps for more complex payload because of the copying between ETS
and the process. But maps consume much more memory than ETS. I
believe the memory released when reallocating structures in ETS
doesn't need to be garbage collected since it's implemented in C
outside of any process memory?<br>
<br>
<br>
<blockquote
cite="mid:F57DB5D2-69BB-4844-B592-78DB212D5B64@ericsson.com"
type="cite">
<blockquote type="cite">
<div>
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px
0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p><a moz-do-not-send="true"
href="https://gist.github.com/amiramix/d43c9a73a6fe6d651d7f"
target="_blank">https://gist.github.com/amiramix/d43c9a73a6fe6d651d7f</a></p>
<p>Maps are quite performant but process dictionary
is still quicker and maps are the worst when it
comes to consumed memory, taking twice as much as
dict or process dictionary and over 5 times as
much memory as ets.</p>
</div>
</blockquote>
<div>Well, you are comparing apples and oranges. Process
dictionary and ETS are something completely different
from gb_trees, dict, maps or orddict.</div>
</div>
</div>
</div>
</div>
</blockquote>
</blockquote>
<br>
What's the point of comparing gb_tress to gb_trees or maps to maps?<br>
<br>
<br>
<blockquote
cite="mid:F57DB5D2-69BB-4844-B592-78DB212D5B64@ericsson.com"
type="cite">
<blockquote type="cite">
<div>
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>I think you should look over how you measure
memory. It seems very arbitrary. What are you
measuring? The size of the process after it all? The
gc can shrink or grow the heap during execution,
trying to find a suitable size. Also look at the
number of gc:s during the execution, this will
indicate how garbage you are generating. Normally
benchmarks ends with an explicit garbage_collect
before looking at the size. This gives a more fair
indication of memory consumption.</div>
<div>It all depends on what you want to measure though,
so look it over. </div>
<div><br>
</div>
<div>Another approach would be to use
erts_debug:flat_size/1. With the functional data
structures in this test you'll get:</div>
<div><br>
</div>
<div>- dict: 67568</div>
<div>- gb_trees: 78668</div>
<div>- map_match: 58543</div>
<div>- list: 78665</div>
<div>- orddict: 78665</div>
<div><br>
</div>
<br>
</div>
</div>
</div>
</div>
</blockquote>
</blockquote>
<br>
I am not measuring the "size of the process after it all". I am
measuring how much the size of an Erlang process grows after storing
and reading 30000 key/value pairs. This is to give me a better
understanding of how much memory I would potentially need to store
(for example) a billion of key-value data elements in one million of
Erlang processes (one thousand of such elements in each Erlang
process).<br>
<br>
Measuring the flat size maybe useful for developers implementing the
data structure. But does it tell me how much resources (CPU &
memory) the VM can consume, and thus how much hardware I would need
to buy to run such VM reliably?<br>
<br>
I appreciate your comments so please let me know if there is
anything you would like to add.<br>
<br>
Coming back to the iterator question, some data structures may be
better suitable for implementing such iterator than others. If for
now the LUA table could be implemented either in maps or dict, then
maybe if say, process dictionary supported iterators efficiently,
they could well be implemented using PD instead?<br>
<br>
/Grzegorz<br>
<br>
</body>
</html>