<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    The problem with quantifying those numbers is I've got several

    different plausible designs for the system, and they have different

    values for those numbers.<br>

    <br>

    E.g., one design called for one process/cpu processor.  In that

    design each process would need an ets table and a mnesia table.  The

    mnesia table would be disk-only.  The ets table would hold perhaps

    100,000 entries, each of which would maintain a time stamp for time

    of last access.  When the table started getting full, stale entries

    would need to be rolled out to the database and purged.  <br>

    This design uses a lot less RAM, and an extremely smaller number of

    processes.<br>

    <br>

    Now the tables would be keyed by a programatically generated key

    value to allow unique items to be referred to when they are rolled

    out, so that it's possible to roll them back in.  <br>

    In this design there would be perhaps (at a wild guess!!) one i/o

    operation for every sqrt (# of CPUs * # of entries/table) function

    calls.  But they would tend to come in bursts, so i/o would

    definitely slow things down considerably.<br>

    <br>

    Well, that design wasn't optimized for Erlang.  I've been

    contemplating variations of it over many different languages.  <br>

    <br>

    Now if I can have one process/entry, and if dormant processes

    (waiting for a receive) can sleep in virtual memory, then I will

    need a few million processes, but I will be able to let the system

    manage the activation/sleep cycle of the processes.  In that case

    each external signal is likely to induce around 1,000 to 100,000

    activations in a chain before relaxing into a settled state.  Each

    activation will likely only cause about 500 bytes of data to be

    copied (another wild guess, with part of the uncertainty being how

    many internal pointers Erlang will need to adjust).  But in this

    case the data can be passed on the stack, and tail calls can be

    used.<br>

    <br>

    My problem has been that when I searched for limits on the number of

    Erlang processes I got:<br>

    <i>The maximum number of simultaneously alive Erlang processes is by

      default 32,768. This limit can be configured at startup. For more

      information, see the </i><i><span class="bold_code bc-13"><a

          href="http://erlang.org/doc/man/erl.html#max_processes"><span

            class="code">+P</span></a></span></i><i> command-line flag

      in the </i><i><span class="bold_code bc-18"><a

          href="http://erlang.org/doc/man/erl.html"><span class="code">erl(1)</span></a></span></i><i>

      manual page in ERTS.</i><br>

    and:<br>

    <i>The best thing to do is create a lagom number of processes. Not

      too many, not too few.</i><br>

    and:<br>

    <i><span class="ui_qtext_rendered_qtext">The actual scalability

        achieved depends on your problem, on your design choices, and on

        the underlying execution framework.<br>

        <br>

        Erlang has some things going for it, and while synthetic

        benchmarks have been produced, e.g. that show linear scalability

        within one node up to some 30-40 cores, and linear scalability

        in an Erlang cluster up to 100 nodes and a total of 1200 cores,

        the scalability story in Erlang is not so much about that, as it

        is about achieving real-world scalability in systems that

        actually do something useful.<br>

        <br>

      </span></i><span class="ui_qtext_rendered_qtext">Since I have a

      single system, this left me with the impression that I shouldn't

      use too many processes, and the best guess of the system at a

      reasonable maximum was a bit under</span><span

      class="ui_qtext_rendered_qtext"><i> 32,768.  It *was* clear that I

        could raise that limit, but raising it by more than an order of

        magnitude, while allowed, appeared probably unwise.<br>

        <br>

        It appears now that this was a mistaken assumption, but I still

        don't see why I should have guessed differently.<br>

      </i> <br>

    </span><i><span class="ui_qtext_rendered_qtext"></span></i>On

    02/08/2018 12:05 PM, Joe Armstrong wrote:<br>

    <blockquote type="cite"

cite="mid:CAANBt-pzhjsCoBXk+QNN7D26SnThx_4nvRV-=B4JXo9LCj2=hA@mail.gmail.com">

      <pre wrap="">In order to even think about your question I'd need certain data -

words like "huge" as in "huge amounts of copying" and "limited numbers

of processes"

etc. do not convey much meaning.

Huge means different things to different people - to some people Huge

means Gbytes

(I talked the other day to somebody who used the word Huge - and I

said "how big"

he said tens of PetaBytes)

To me huge means a data structure that is larger than the RAM on my machine

(which is 16GB) - so not only do you have to say what you meant by huge but also

how your numbers relate to your machine(s).

Also how long do you have to do what? - Handling huge amounts of data

is easy if you have a big enough disks and enough time - you also need

to say (roughly) how long you have to do what (are we talking seconds,

milliseconds,

hours, days???)

The more numbers you add to questions like this the better answers

you'll get :-)

Cheers

/Joe

On Wed, Feb 7, 2018 at 5:56 PM, Charles Hixson

<a class="moz-txt-link-rfc2396E" href="mailto:charleshixsn@earthlink.net"><charleshixsn@earthlink.net></a> wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">When should a private ets table be preferred over the process directory?

To give some context, I'm expecting to has nearly as many processes as I can

run, and that each one will need internal mutable state.  Also, that the

mutable state will be complex (partially because of the limited number of

processes), so passing the state as function parameters would entail huge

amounts of copying.  (Essentially I'd be modifying nodes deep within trees.)

Mutable state would allow me to avoid the copying, and the state is not

exported from the process.  I'm concerned that a huge number of private ets

tables would use excessive memory, decreasing the number of processes I

could use...but all the references keep saying not to use the process

directory.

I'm still designing things now, so this is the ideal time to decide.  An

alternative is that I could use a single public ets table, with each process

only accessing its own data, but I suspect that might involve a lot of

locking overhead, even though in principle nothing should need to be locked.

_______________________________________________

erlang-questions mailing list

<a class="moz-txt-link-abbreviated" href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>

<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a>

</pre>

      </blockquote>

      <pre wrap="">

</pre>

    </blockquote>

    <br>

  </body>

</html>