[erlang-questions] How to quick calculation max Erlang's processes and scheduler can alive based on machine specs
I Gusti Ngurah Oka Prinarjaya
okaprinarjaya@REDACTED
Mon Jul 15 21:14:17 CEST 2019
Hi Andersen,
Woww thank you very much for the explanation.
>> First, you need to recognize you have a parallelism problem
Yes, i need parallelism. But i don't have time to research about GPU
processing.
Yes now i know, how much scheduler that i need to provide.
Thank you :)
Pada tanggal Sen, 15 Jul 2019 pukul 20.23 Jesper Louis Andersen <
jesper.louis.andersen@REDACTED> menulis:
> On Sat, Jul 13, 2019 at 10:47 AM I Gusti Ngurah Oka Prinarjaya <
> okaprinarjaya@REDACTED> wrote:
>
>> Hi,
>>
>> I'm a super newbie, I had done very very simple parallel processing using
>> erlang. I experimenting with my database containing about hundreds of
>> thousands rows. I split the rows into different offsets then assign each
>> worker-processes different rows based on offsets. For each row i doing
>> simple similar text calculation using binary:longest_common_prefix/1
>>
>>
> First, you need to recognize you have a parallelism problem, and not a
> concurrency problem. So you are interested in what speedup you can get by
> adding more cores, compared to a single-process solution. The key analysis
> parameters are work, span and cost[0]. On top of that, you want to look at
> the speedup factor (S = T_1 / T_p).
>
>
>> 1. How to quick calculation / dumb / simple calculation max Erlang's
>> processes based on above machine specs?
>>
>>
> This requires measurement. A single-core/process system have certain
> advantages:
>
> * It doesn't need to lock and latch.
> * It doesn't need to distribute data (scatter) and recombine data (gather).
>
> Adding more processes has an overhead and at a point, it will cease to
> provide speedup. In fact, speedup might go down.
>
> What I tend to do, is to napkin math the cost of a process. The PCB I
> usually set at 2048 bytes. It is probably lower in reality, but an upper
> bound is nice. If each process has to keep, say, 4096 bytes of data around,
> I set it at 2*4096 to account for the GC. So that is around 10 Kilobytes
> per process. If I have a million processes, that is 10 gigabytes of memory.
> If each process is also doing network I/O you need to account for the
> network buffers in the kernel as well, etc. However, since you are looking
> at parallelism, this has less importance since you don't want to keep a
> process per row (the overhead tends to be too big in that case, and the
> work is not concurrent anyway[1]).
>
>> 2. The running time when doing similar text processing with 10 worker, or
>> 20 worker or 40 worker was very blazingly fast. So i cannot feel, i cannot
>> see the difference. How to measure or something like printing total minutes
>> out? So i can see the difference.
>>
>>
> timer:tc/1 is a good start. eministat[2] is a shameless plug as well.
>
>> 3. How many scheduler need to active / available when i create 10
>> processes? or 20 processes? 40 processes? and so on..
>>
>>
> If your machine has 2 physical cores with two hyperthreads per core, a
> first good ballpark is either 2 or 4 schedulers. Adding more just makes
> them fight for the resources. The `+stbt` option might come in handy if
> supported by your environment. Depending on your workload, you can expect
> some -30 to 50% extra performance out of the additional hyperthread. In
> some cases it hurts performance:
>
> * Caches can be booted out by the additional hyperthread
> * If you don't have memory presssure to make a thread wait, there is
> little additional power in the hyperthread
> * In a laptop environment, the additonal hyperthread will generate more
> thermal heat. This might make the CPU clock down resulting in worse run
> times. This is especially important on MacBooks. They have really miserable
> thermals and add way too powerful CPUs in a bad thermal solution. It gives
> them good peak performance when "sprinting" for short bursts, but bad
> sustain performance, e.g., "marathons". Battery vs AC power also means a
> lot and will mess with runtimes.
>
> As for how many processes: you want to have enough to keep all your
> schedulers utilized, but not so many your work is broken into tiny pieces.
> This will mean more scatter/gather IO is necessary, impeding your
> performance. And if that IO is going across CPU cores, you are also looking
> at waiting on caches.
>
> If you are really interested in parallel processing, it is probably better
> to look at languages built for the problem space. Rust, with its rayon
> library. Or something like https://futhark-lang.org/ might be better
> suited. Or even look at TensorFlow. It has a really strong, optimized,
> numerical core. Erlang, being bytecode interpreted, pays an overhead which
> you have to balance out with either more productivity, ease of programming,
> faster prototyping or the like. Erlang tends to be stronger at MIMD style
> processing (and so does e.g., Go).
>
> [0] https://en.wikipedia.org/wiki/Analysis_of_parallel_algorithms
> [1] your work is classical SIMD rather than MIMD.
> [2] github.com/jlouis/eministat
>
> --
> J.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190716/2dc0dd6d/attachment.htm>
More information about the erlang-questions
mailing list