[erlang-questions] A problem with ETS + SMP

Wed Mar 26 22:29:04 CET 2008

My program has a large number of parallel operating processes (it's a neural
network, each process is a neuron), each process has its own ETS table
(set,private). each process does a bit of work on it's own table, and then
sends a message to another process.... But even though each process has its
own ETS table, when moving the program from a single core to a quad core
(going from smp disabled to smp enabled on the same machine), the execution
time increases twofold, (when retaining the same cpu speed, ram fsb...). so
it goes from 200us, to 400us per single traversing of the entire net of
processes (the processes are traversed thousands of times...). I
reprogrammed the entire thing using only a single public ets table (just to
see the difference), the single cpu program didn't change its speed, but the
quad core increased execution time even further. Rewriting the program in
dict, does on the other hand speed up the execution when moving from single
core to quad core. Though as you guys know from running benchmarks, dict
though having a very small (<<1us) fetch time, has a huge store time,
becoming larger and larger with increased number of elements stored
(10us-100us......), and in my case, each process needs to scale up to deal
with millions of elements, hence the using of ETS. Using SET on the other
hand is even worse than dict, in both insertion and random fetching.

The question is: Why does it slow down with smp activated when each of the
processes has its own ets table?

So far I've got this far in the problem:
I think that it is not the case that there is a bottle necking due to mail
boxes. For one, each process does not have more than 100 other processes
connected to it (during a standard test), and in a smaller test where each
process is connected to only 2 or 4 other processes, same thing occurs. I
use ETS table so that I won't have the building up of messages in the mail
box, as soon as a message arrives at a process, he right away enters it into
the table with the key of the Pid of the process that sent it a message, and
the value the sender had send with its Pid (exp: {self(), prediction,
Value}). With the insertion time of ~2 microseconds and only having lets say
4 other processes connected to another processes, there is no bottle necking
due to mail box. (that's why I'm using ETS, it's essential for the speed of
the network, and to have the ability to efficiently and quickly access any
value, any input, any time...  at random)

I've tested this setup without most of the calculations done in each process
to see what happens with just message passing(second order derivatives...and
other calculation, disabled) same problem occurs. I've now tried the code on
a single CPU laptop, very peculiar thing happens. Without smp enabled it
runs at ~300us per pass, with smp enabled (and it still only has 1 cpu, I
simply: erl -smp), it goes up to ~450us still. Something funny is going on
with the smp and ets.

On the quad core I've gathered the following data:
letting everything else stay constant, the only thing I changed was the
number of schedulers::
smp disabled: 200us per nertwork pass.
-smp +S 1: 300us
-smp +S 4: 350us
-smp +S 8:
SMP +S 8: 1.14733e+4 us

Anyone ever came across a similar problem with ets tables?
Regards,
-Gene
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080326/7d0e52e4/attachment.htm>