[erlang-questions] My frustration with Erlang

Sun Sep 14 09:21:41 CEST 2008

I havent read the whole correspndance (it seems to be going on for a way too 
long), but like to add my 2c worth...

While SMP (+S1) approach may solve some problems, it defeats the purpose of 
having a multi-core machine. Please note that multi-core machines have lower 
clock speeds, thus should run generally slower per given CPU core. IMHO, if 
+S 1 solves your problem, maybe you should revisit your code -- I think that 
it is wrong to expect that the same code would work better on SMP just 
because you had such expectations. For example, it is known fact that ETS 
works slower in SMP environment.
Also, one should not forget to use +A in addition to +S -- although you do 
not have any disk I/O, I think this parameter is relevant for PORT 
scheduling, therefore improving performance of your I/O.

V.

----- Original Message ----- 
From: "Kevin Scaldeferri" <kevin@REDACTED>
To: "Edwin Fine" <erlang-questions_efine@REDACTED>
Cc: "erlang-questions Questions" <erlang-questions@REDACTED>
Sent: Sunday, September 14, 2008 12:07 AM
Subject: Re: [erlang-questions] My frustration with Erlang

>
> On Sep 13, 2008, at 1:56 PM, Edwin Fine wrote:
>
>> You'd probably have to partition the load to round-robin across the
>> individual VMs, possibly using some front-end load-balancing
>> hardware. This is why I keep harping on this: some time ago I put
>> the system I am working on under heavy load to test the maximum
>> possible throughput. There was no appreciable disk I/O. The kicker
>> is that I did not see an even distribution of load across the 4
>> cores of my box. In fact, it looked as if one or maybe two cores
>> were being used at 100% and the rest were idle. When I re-ran the
>> test on a whim, using only 1 non-SMP (+S 1) node, I actually got
>> better performance.
>>
>> This seemed counter-intuitive and against the "Erlang SMP scales
>> linearly for CPU-intensive loads" idea. I have not done a lot of
>> investigation into this because I have other fish to fry right now,
>> but the folks over at LShift (RabbitMQ) - assuming I did not
>> misunderstand them - wrote that they had seen similar behavior when
>> running clustered Rabbit nodes (i.e. better performance from N
>> single-CPU nodes than N +S N nodes). However, they, like me, are not
>> ready to come out and state this bluntly as a fact because (I
>> believe) they feel not enough investigation has been done to make
>> this a conclusive case.
>
> I've also been seeing similar behavior trying to parallelize the
> alioth shootout code, fwiw.  I'd also say it's premature to draw any
> concrete conclusions, but another anecdotal point.
>
> (Also, on the particular OS & hardware the benchmarks run on, the
> total CPU usage nearly doubles for the parallel implementations.  On
> my 2-core mac, though, I see no more than a 10% increase in total CPU
> usage, and a near 100% improvement in the wall-time, as one should
> expect on the embarrassingly parallel problems.  Dunno, if this is
> related to the OS, the chip (Core 2 Duo vs Core 2 Quad), HiPE, or what.)
>
>
> -kevin
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions