[erlang-questions] Problems Writing a Scalable Shared Data Structure

Tue Jul 8 01:20:56 CEST 2008

Thanks Robert and Hynek.

Hynek; playing around with the number of scheduler threads (+S) or
asynchronous threads (surprisingly) does not significantly alter the
results. As for the tree size, on average it should be around 500 nodes,
assuming it's balanced this gives us a bst that's roughly 8 nodes deep. A
tree this big should scale well with 8 processors. Moreover, I'd expect a
tree this big to  scale almost perfectly going from 1 to 2 processors;
especially if the ratio of lookups (contains()), that don't modify the data
structure, is high. However, all I'm seeing is minor improvements. Moreover,
increasing the size of the tree from 1000 to 10000 does improve scalability
a bit, but not by all that much either.

Robert; I think you're referring to an earlier version of my code, where I
don't wait for the results of an insert or a delete. The lastest one (
http://www.cs.auckland.ac.nz/~fuad/cbst.erl<http://www.cs.auckland.ac.nz/%7Efuad/cbst.erl>),
the final node inserted/deleted does inform the client whether the operation
was successful or not. I can't confirm, but on a big enough tree stacking
shouldn't be an issue since each node will probably only relay its request.
But then again you're right, deeper analyses seems to be in order before I
could make such statements with any kind of certainty.

That said, I feel that this solution doesn't seem to be the best one. I'm
sure other Erlang programmers must have had to solve a similar problem,
maintaining shared data between different threads in a scalable way.

Thanks again,
/Fuad

On Mon, Jul 7, 2008 at 9:58 PM, Robert Raschke <rtrlists@REDACTED>
wrote:

> Hi Fuad,
>
> On 7/7/08, Fuad Tabba <fuad@REDACTED> wrote:
> > Thanks Robert,
> >
> > Yes I am familiar with Amdahl's law. That said, it's not true that all
> > writes need to be serialized; it's a binary tree, and unless the
> > modifications happen to be to the same node, there's no reason why they
> need
> > to be serialized.
> >
> > A database would solve the problem kinda; but that's like using a
> > steamroller to crack a nut. What I'm looking for a lightweight option
> that
> > could temporarily store data shared between threads, and would scale well
> as
> > the number of threads accessing/modifying this structure increases. I
> know
> > how to solve this problem in C (writing a concurrent binary search tree
> that
> > scales well), but since I'm new to Erlang I was unsuccessful trying to
> solve
> > this problem in Erlang.
>
> OK, I see.
>
> So, if I read your code correctly, your rpcInserts() and rpcDeletes()
> return quick, because your request is trickling down the tree on its
> own. Its the rpcContains() that need to wait for a proper result.
>
> It might be interesting to test the individual operations on their
> own, and then combinations thereof. That might give a quicker insight
> into what is causing the lack of speedup. My (uneducated) hunch is
> that the rpcContains() calls are where your "threads" are getting
> stuck (due in part to all the inserts and deletes clogging up the
> tree).
>
> Very unscientific, sorry.
>
> Robby
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080708/1c7a366d/attachment.htm>