<div dir="ltr">I think you have misunderstood my reasoning here. If you have 8 Erlang VMs going, each with +S 1, how exactly does this defeat the purpose of having a multi-core machine? Long before threading models came into vogue, multiple processes were taking advantage of multi-CPU systems simply by letting the OS scheduler choose which CPU on which to run the next runnable process. Since Erlang threads are "green" threads, they don't individually use the threading model of the underlying operating system anyway. Each VM, other than for I/O operations on files (controleld by +A, I believe), uses AFAIK one O/S thread per scheduler. So +S 8 will use 8 O/S threads. When you have 8 threads sharing something (which they will when running SMP), there is a risk of contention slowing things down.<br>

<br>I wonder what happens when Erlang processes running on the scheduler thread of one core send lots of messages to Erlang processes running on a scheduler on a different core? There HAS to be a lock there somewhere while the message moves from the memory space of the first process to the second one, or if shared memory is being used (more likely scenario), there is going to be some locking of shared data structures. Now if there are thousands of processes sending in aggregate hundreds of thousands of messages, I believe this will not scale well if the messages cross scheduler boundaries. I could be wrong, but it fits the anecdotal data. It would be very interesting to see the effect of designing an application so that processes that send lots of messages to each other run on the same scheduler, sort of "clustering" the messages so that they stay within the same VM's memory space. Using multiple VMs with +S 1 would force this to happen because there IS only one scheduler thread, so hopefully the VM doesn't use any SMP locks/mutexes under those conditions. This is in contrast to assuming a uniform model where the cost of sending messages is the same regardless of the fact that it may be going to a different process space and incurring the cost of a lock. The locks used are Posix semaphores, IIRC, and they are not cheap.<br>

<br>I guess what I am trying to say is that the basic assumption Erlang applications seem to have been designed on top of up to now, knowing that the cost of sending a message is extremely low, is perhaps not as true as it used to be when using SMP. I would be very interested in hearing from Joe Armstrong about this. I seem to recall that he wrote something about the cost of what a process does needing to be much greater than the cost of starting a process or sending a message to it in order to scale. This is true of any IPC, even cheap IPC like Erlang. On top of that, things that use shared structures like ETS heavily, for example Mnesia transactions, are possibly going to suffer in the SMP context. All of this is just conjecture based on my own work and anecdotal evidence presented by some others, but I feel in my bones that there is something to this. Look at is this way: Erlang got its major performance from eliminating locks. Running on a 1024 processor system in SMP mode and treating it like it's one big uniform processor is going to backfire badly. I haven't seen much discussion about this. Maybe it's too obvious to mention.<br>

<br>There is an interesting discussion (<a href="http://www.erlang.org/pipermail/erlang-questions/2008-January/032273.html">http://www.erlang.org/pipermail/erlang-questions/2008-January/032273.html</a>) about assigning individual +S 1 Erlang VMs to a given CPU using processor affinity. This could help considerably by leaving more of a VM's running state cached in the processor code and data caches than if the VM's thread were to be switched between CPUs by the OS. <br>

<br>I plan Real Soon Now ;-) to do some extensive research into this to see if there is merit to it. Just as soon as I get over the hump in a current project.<br><br>Anyway, hopefully this clarifies my thinking to you and you have less of an issue with it.<br>

<br><br><div class="gmail_quote">On Sun, Sep 14, 2008 at 3:21 AM, Valentin Micic <span dir="ltr"><<a href="mailto:valentin@pixie.co.za">valentin@pixie.co.za</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

I havent read the whole correspndance (it seems to be going on for a way too long), but like to add my 2c worth...<br>

<br>

While SMP (+S1) approach may solve some problems, it defeats the purpose of having a multi-core machine. Please note that multi-core machines have lower clock speeds, thus should run generally slower per given CPU core. IMHO, if +S 1 solves your problem, maybe you should revisit your code -- I think that it is wrong to expect that the same code would work better on SMP just because you had such expectations. For example, it is known fact that ETS works slower in SMP environment.<br>


Also, one should not forget to use +A in addition to +S -- although you do not have any disk I/O, I think this parameter is relevant for PORT scheduling, therefore improving performance of your I/O.<br>

<br>

V.<br>

<br>

<br>

----- Original Message ----- From: "Kevin Scaldeferri" <<a href="mailto:kevin@scaldeferri.com" target="_blank">kevin@scaldeferri.com</a>><br>

To: "Edwin Fine" <<a href="mailto:erlang-questions_efine@usa.net" target="_blank">erlang-questions_efine@usa.net</a>><br>

Cc: "erlang-questions Questions" <<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a>><br>

Sent: Sunday, September 14, 2008 12:07 AM<br>

Subject: Re: [erlang-questions] My frustration with Erlang<br>

<br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div class="Wj3C7c">

<br>

On Sep 13, 2008, at 1:56 PM, Edwin Fine wrote:<br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

You'd probably have to partition the load to round-robin across the<br>

individual VMs, possibly using some front-end load-balancing<br>

hardware. This is why I keep harping on this: some time ago I put<br>

the system I am working on under heavy load to test the maximum<br>

possible throughput. There was no appreciable disk I/O. The kicker<br>

is that I did not see an even distribution of load across the 4<br>

cores of my box. In fact, it looked as if one or maybe two cores<br>

were being used at 100% and the rest were idle. When I re-ran the<br>

test on a whim, using only 1 non-SMP (+S 1) node, I actually got<br>

better performance.<br>

<br>

This seemed counter-intuitive and against the "Erlang SMP scales<br>

linearly for CPU-intensive loads" idea. I have not done a lot of<br>

investigation into this because I have other fish to fry right now,<br>

but the folks over at LShift (RabbitMQ) - assuming I did not<br>

misunderstand them - wrote that they had seen similar behavior when<br>

running clustered Rabbit nodes (i.e. better performance from N<br>

single-CPU nodes than N +S N nodes). However, they, like me, are not<br>

ready to come out and state this bluntly as a fact because (I<br>

believe) they feel not enough investigation has been done to make<br>

this a conclusive case.<br>

</blockquote>

<br>

I've also been seeing similar behavior trying to parallelize the<br>

alioth shootout code, fwiw.  I'd also say it's premature to draw any<br>

concrete conclusions, but another anecdotal point.<br>

<br>

(Also, on the particular OS & hardware the benchmarks run on, the<br>

total CPU usage nearly doubles for the parallel implementations.  On<br>

my 2-core mac, though, I see no more than a 10% increase in total CPU<br>

usage, and a near 100% improvement in the wall-time, as one should<br>

expect on the embarrassingly parallel problems.  Dunno, if this is<br>

related to the OS, the chip (Core 2 Duo vs Core 2 Quad), HiPE, or what.)<br>

<br>

<br>

-kevin<br></div></div><div class="Ih2E3d">

_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

<a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a> <br>

</div></blockquote>

<br>

<br>

</blockquote></div><br></div>