<div dir="ltr">Hi Dániel,<div><br></div><div>>> Try experimenting with different number of processes while monitoring the scheduler utilisation (e.g. with observer): if you're much below 100% utilisation (across all </div><div>>> schedulers), you have too few</div><div>I am lucky, i always get 100% utilisation</div><div><br></div><div>>> If, on the other hand, you see the run queue going up (the number of runnable processes that are waiting for a CPU slice to run), you have too many.</div><div>Where to see this?</div><div><br></div><div>Thank you :)</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Pada tanggal Min, 14 Jul 2019 pukul 03.36 Dániel Szoboszlay <<a href="mailto:dszoboszlay@gmail.com">dszoboszlay@gmail.com</a>> menulis:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Your second and third questions are easy to answer: measure the execution time of functions with timer:tc and even with a single scheduler you can run as many processes as you want. They will compete for a single core though, and will have to wait a long time to get some CPU time once scheduled out. So just stick to the default, and use as many schedulers as many cores you have.<div><br></div><div>Now, finding the maximum (or rather: optimal) number of processes to perform this particular task on your particular machine is hard. A very dumb calculation would be that because all of the processes will be doing the same, CPU-bound task, they will all compete for the same HW resources, so you won't gain much by having more processes than CPU cores (4 in your case). If accessing the rows involves some I/O, than you should use more processes, so some processes can run the CPU-bound text calculations while others wait for I/O. Try experimenting with different number of processes while monitoring the scheduler utilisation (e.g. with observer): if you're much below 100% utilisation (across all schedulers), you have too few, If, on the other hand, you see the run queue going up (the number of runnable processes that are waiting for a CPU slice to run), you have too many.</div><div><br></div><div>But you can safely use a bit more processes than the minimum needed to saturate the CPU. It can even speed up the whole job a bit if not all rows take equal time to process (consider one process getting a chunk of super slow to process rows: at the end of all other processes will have finished and you'll have to wait for this big worker to do its work on a single core; having twice as many processes would cut the chunk into two halves, also halving the time to wait at the end). However, after one (hard to find) point adding more processes would hurt performance: more processes means more cache misses and more synchronisation overhead at the beginning and end of the job.</div><div><br></div><div>The theoretical maximum number of processes is probably constrained by your RAM: measure how much memory one process needs, and divide 8 GB (minus some for the OS and other programs) with this number. You won't be able to fit more processes in RAM, and swapping will only slow down your computation. But this limit is probably in the thousands of processes range.</div><div><br></div><div>Hope this helps,</div><div>Daniel</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, 13 Jul 2019 at 10:47, I Gusti Ngurah Oka Prinarjaya <<a href="mailto:okaprinarjaya@gmail.com" target="_blank">okaprinarjaya@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>I'm a super newbie, I had done very very simple parallel processing using erlang. I experimenting with my database containing about hundreds of thousands rows. I split the rows into different offsets then assign each worker-processes different rows based on offsets. For each row i doing simple similar text calculation using binary:longest_common_prefix/1</div><div><br></div><div>Let's assume my total rows is 200,000 rows of data.</div><div>At first, i try to create 10 worker-processes, i assign 20,000 rows at each worker-process.</div><div>Second, i try to create 20 worker-processes, i assign 10,000 rows at each worker-process.</div><div>Third, i try to create 40 worker-processes, i assign 5000 rows at each worker-process.</div><div><br></div><div>My machine specs:<br>- MacBook Pro (13-inch, 2017, Four Thunderbolt 3 Ports)</div><div>- Processor 3,1 GHz Intel Core i5 ( 2 physical cores, with HT )</div><div>- RAM 8 GB 2133 MHz LPDDR3</div><div><br></div><div>My questions is</div><div><br></div><div>1. How to quick calculation / dumb / simple calculation max Erlang's processes based on above machine specs?<br><br>2. The running time when doing similar text processing with 10 worker, or 20 worker or 40 worker was very blazingly fast. So i cannot feel, i cannot see the difference. How to measure or something like printing total minutes out? So i can see the difference.</div><div><br></div><div>3. How many scheduler need to active / available when i create 10 processes? or 20 processes? 40 processes? and so on..</div><div><br></div><div>Please enlightenment</div><div><br></div><div>Thank you super much </div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div>
_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
</blockquote></div>
</blockquote></div>