My program has a large number of parallel operating processes (it's a neural network, each process is a neuron), each

process has its own ETS table (set,private). each process does a bit of

work on it's own table, and then sends a message to another process....

But even though each process has its own ETS table, when moving the

program from a single core to a quad core (going from smp disabled to

smp enabled on the same machine), the execution time increases twofold, (when retaining the

same cpu speed, ram fsb...). so it goes from 200us, to 400us per single

traversing of the entire net of processes (the processes are traversed

thousands of times...). I reprogrammed the entire thing using only a

single public ets table (just to see the difference), the single cpu

program didn't change its speed, but the quad core increased execution

time even further. Rewriting the program in dict, does on the other

hand speed up the execution when moving from single core to quad core.

Though as you guys know from running benchmarks, dict though having a very

small (<<1us) fetch time, has a huge store time, becoming larger

and larger with increased number of elements stored (10us-100us......),

and in my case, each process needs to scale up to deal with millions of

elements, hence the using of ETS. Using SET on the other hand is even

worse than dict, in both insertion and random fetching.

<br>

<br>The question is: Why does it slow down with smp activated when each

of the processes has its own ets table?<br>

<br>

So far I've got this far in the problem:

<br>I think that it is not the case that there is a bottle necking due

to mail boxes. For one, each process does not have more than 100 other

processes connected to it (during a standard test), and in a smaller

test where each process is connected to only 2 or 4 other processes,

same thing occurs. I use ETS table so that I won't have the building up

of messages in the mail box, as soon as a message arrives at a process,

he right away enters it into the table with the key of the Pid of the

process that sent it a message, and the value the sender had send with

its Pid (exp: {self(), prediction, Value}). With the insertion time of

~2 microseconds and only having lets say 4 other processes connected to

another processes, there is no bottle necking due to mail box. (that's

why I'm using ETS, it's essential for the speed of the network, and to

have the ability to efficiently and quickly access any value, any

input, any time...  at random)

<br>

<br>I've tested this setup without most of the calculations done in

each process to see what happens with just message passing(second order

derivatives...and other calculation, disabled) same problem occurs.

I've now tried the code on a single CPU laptop, very peculiar thing

happens. Without smp enabled it runs at ~300us per pass, with smp

enabled (and it still only has 1 cpu, I simply: erl -smp), it goes up

to ~450us still. Something funny is going on with the smp and ets.

<br>

<br>On the quad core I've gathered the following data:<br>letting everything else stay constant, the only thing I changed was the number of schedulers::<br>smp disabled: 200us per nertwork pass.<br>-smp +S 1: 300us<br>

-smp +S 4: 350us<br>-smp +S 8:<br>

SMP +S 8: 1.14733e+4 us<br><br>

Anyone ever came across a similar problem with ets tables?<br>Regards,<br>-Gene<br>