[erlang-questions] Private ets table vs. Process directory

Charles Hixson charleshixsn@REDACTED
Fri Feb 9 03:17:20 CET 2018


The problem with quantifying those numbers is I've got several different 
plausible designs for the system, and they have different values for 
those numbers.

E.g., one design called for one process/cpu processor.  In that design 
each process would need an ets table and a mnesia table.  The mnesia 
table would be disk-only.  The ets table would hold perhaps 100,000 
entries, each of which would maintain a time stamp for time of last 
access.  When the table started getting full, stale entries would need 
to be rolled out to the database and purged.
This design uses a lot less RAM, and an extremely smaller number of 
processes.

Now the tables would be keyed by a programatically generated key value 
to allow unique items to be referred to when they are rolled out, so 
that it's possible to roll them back in.
In this design there would be perhaps (at a wild guess!!) one i/o 
operation for every sqrt (# of CPUs * # of entries/table) function 
calls.  But they would tend to come in bursts, so i/o would definitely 
slow things down considerably.

Well, that design wasn't optimized for Erlang.  I've been contemplating 
variations of it over many different languages.

Now if I can have one process/entry, and if dormant processes (waiting 
for a receive) can sleep in virtual memory, then I will need a few 
million processes, but I will be able to let the system manage the 
activation/sleep cycle of the processes.  In that case each external 
signal is likely to induce around 1,000 to 100,000 activations in a 
chain before relaxing into a settled state.  Each activation will likely 
only cause about 500 bytes of data to be copied (another wild guess, 
with part of the uncertainty being how many internal pointers Erlang 
will need to adjust).  But in this case the data can be passed on the 
stack, and tail calls can be used.

My problem has been that when I searched for limits on the number of 
Erlang processes I got:
/The maximum number of simultaneously alive Erlang processes is by 
default 32,768. This limit can be configured at startup. For more 
information, see the //+P 
<http://erlang.org/doc/man/erl.html#max_processes>//command-line flag in 
the //erl(1) <http://erlang.org/doc/man/erl.html>//manual page in ERTS./
and:
/The best thing to do is create a lagom number of processes. Not too 
many, not too few./
and:
/The actual scalability achieved depends on your problem, on your design 
choices, and on the underlying execution framework.

Erlang has some things going for it, and while synthetic benchmarks have 
been produced, e.g. that show linear scalability within one node up to 
some 30-40 cores, and linear scalability in an Erlang cluster up to 100 
nodes and a total of 1200 cores, the scalability story in Erlang is not 
so much about that, as it is about achieving real-world scalability in 
systems that actually do something useful.

/Since I have a single system, this left me with the impression that I 
shouldn't use too many processes, and the best guess of the system at a 
reasonable maximum was a bit under/32,768.  It *was* clear that I could 
raise that limit, but raising it by more than an order of magnitude, 
while allowed, appeared probably unwise.

It appears now that this was a mistaken assumption, but I still don't 
see why I should have guessed differently.
/
//On 02/08/2018 12:05 PM, Joe Armstrong wrote:
> In order to even think about your question I'd need certain data -
> words like "huge" as in "huge amounts of copying" and "limited numbers
> of processes"
> etc. do not convey much meaning.
>
> Huge means different things to different people - to some people Huge
> means Gbytes
> (I talked the other day to somebody who used the word Huge - and I
> said "how big"
> he said tens of PetaBytes)
>
> To me huge means a data structure that is larger than the RAM on my machine
> (which is 16GB) - so not only do you have to say what you meant by huge but also
> how your numbers relate to your machine(s).
>
> Also how long do you have to do what? - Handling huge amounts of data
> is easy if you have a big enough disks and enough time - you also need
> to say (roughly) how long you have to do what (are we talking seconds,
> milliseconds,
> hours, days???)
>
> The more numbers you add to questions like this the better answers
> you'll get :-)
>
> Cheers
>
> /Joe
>
>
>
> On Wed, Feb 7, 2018 at 5:56 PM, Charles Hixson
> <charleshixsn@REDACTED> wrote:
>> When should a private ets table be preferred over the process directory?
>>
>> To give some context, I'm expecting to has nearly as many processes as I can
>> run, and that each one will need internal mutable state.  Also, that the
>> mutable state will be complex (partially because of the limited number of
>> processes), so passing the state as function parameters would entail huge
>> amounts of copying.  (Essentially I'd be modifying nodes deep within trees.)
>>
>> Mutable state would allow me to avoid the copying, and the state is not
>> exported from the process.  I'm concerned that a huge number of private ets
>> tables would use excessive memory, decreasing the number of processes I
>> could use...but all the references keep saying not to use the process
>> directory.
>>
>> I'm still designing things now, so this is the ideal time to decide.  An
>> alternative is that I could use a single public ets table, with each process
>> only accessing its own data, but I suspect that might involve a lot of
>> locking overhead, even though in principle nothing should need to be locked.
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180208/7256c11b/attachment.htm>


More information about the erlang-questions mailing list