[erlang-questions] Strange observation running in an SMP environment.

Tue Jun 12 22:39:02 CEST 2012

On Tue, Jun 12, 2012 at 9:36 PM, Jesper Louis Andersen
<jesper.louis.andersen@REDACTED> wrote:
> On Mon Jun 11 21:48:20 2012, Ronny Meeus wrote:
>
> [Ring benchmark - showing that 2 rings is considerably slower than one]
>
>> Can somebody explain this?
>
>
> There are a couple of factors which plays a role here. First of all, you can
> use the nice percept tool to look at your code:
>
> https://gist.github.com/2919608
>
> run t:start(), t:analyze(), t:server(). Then point your browser to
> localhost:8989 and have fun :)
>

Thanks for this hint. It is a wonderful tool that I was not aware of.
This is also one of the things I like about Erlang: it comes with a
lot of tools and libraries out of the box.

> The first problem is that the concurrency of your rings are 1 both of them.
> This means you can at most keep one core busy on each ring and your maximum
> speedup is thus 2. To see this, note that one one process can run in the
> ring at a time. All others are waiting for the token to come to them. This
> is also shown in the percept graph: At the start, you have a very high
> concurrency level because you just spawned a lot of processes and they are
> now runable. Then the ring connect phase happens and this blocks many of
> them in a receive waiting for a {connect, ...} message. When we run around
> the ring, the concurrency level is hovering around 2 to 4. Which makes sense
> since we added two Led-processes which also needs to run from time to time.
>
> The second problem is that your benchmark is not doing work. So what you are
> measuring is locking overhead from the Erlang VM and your kernel. Passing a
> message is VERY fast in Erlang and that is all you are doing. If you have 4
> cores in your machine, they are stealing work from each other all the time
> and this accounts for the overhead. You could try playing around with `erl
> +sbt tnnps` as an erlang flag, but it may/may not give you anything here.
>
> it is also worth trying to disable SMP entirely `erl -smp disable` since
> this avoids any kind of locking. In some cases that is faster for some
> workloads. And yours might be one.
>
> --
> Jesper Louis Andersen
>  Erlang Solutions Ltd., Copenhagen, DK

I added a load consumer to the application: each time a worker
receives a message, it forwards the message and generates some load
(code uploaded to BitBucket). After this change the perftool clearly
shows that on average something like 15 processes are consuming load.

These are the new measured times (for a ring of 10 threads and 10
message roundtrips):

erl -smp disable
Finished. Time:12769406
Finished. Time:12843219

taskset 1 erl (1 core)
Finished. Time:13701993
Finished. Time:14170071

taskset 3 erl (2 cores)
Finished. Time:5954985
Finished. Time:7362113

erl (4 cores):
Finished. Time:3542247
Finished. Time:3555036

So this looks better: now the execution time reduces linearly with the
number of cores.
Also running on 1 core with SMP disabled runs faster than running in
SMP mode on 1 core using taskset.

Does the issue that I had (pure message passing without extra load)
have anything to do with the fact that I'm running on Linux?
I sometimes have the impression (not in the context of my Erlang
playing) that the scheduler does not always makes the right decision
during the migration of processes between cores?

Many thanks for the quick response.

--
Regards,
Ronny