[erlang-questions] Writing a Concurrent Binary Search Tree Data Structure in Erlang

Hynek Vychodil vychodil.hynek@REDACTED
Wed Jun 25 11:57:43 CEST 2008


You should store "persistent" state in server process. When you fall in
performance problem you can partition this state in sub processes and server
starts work as dispatcher. If dispatching is algorithmic you can spawn many
dispatchers of course, but it usually performs better. You can repeat this
fragmentation again and again on sub processes and you end up with process
per node. For performance reasons there will be best solution some sort of
hybrid structure. IMHO ets not performs fully scalable because I think there
is only one "process" which can every time update it.

On Wed, Jun 25, 2008 at 11:46 AM, Fuad Tabba <fuad@REDACTED> wrote:

> I understand that, but what if that structure (binary tree) is needed to
> store some sort of persistent state for all processes to see - reservations
> table for example or something like that. I know there's ETS, but what I'm
> trying to do is understand the basics...
>
> Thanks Hynek,
> /Fuad
>
>
> On Wed, Jun 25, 2008 at 9:40 PM, Hynek Vychodil <vychodil.hynek@REDACTED>
> wrote:
>
>> Because there is no shared state between processes in Erlang, I would
>> start think about process per tree node, not process per action as usual in
>> classic shared memory (for example C) implementations.
>> I know, it sounds weird, but may be help you :-)
>>
>> 2008/6/25 Fuad Tabba <fuad@REDACTED>:
>>
>>>  Hi,
>>>
>>> I'm trying to write a concurrent Binary Search Tree (BST) in Erlang with
>>> the operations insert/delete/contains. My goal is for multiple processes to
>>> be able to concurrently modify this structure and query it correctly.
>>>
>>> My problem is, I can't think of a way to take advantage of the inherent
>>> parallelism in the BST structure. BSTs should be easy to parallelize, since
>>> different inserts/deletes of nodes from the tree could be in completely
>>> different branches. In C for example, it's not hard to write a nonblocking
>>> implementation (lock-free) where adding nodes to the tree can happen
>>> concurrently (unless there's a conflict) using Compare and Exchange (CAS).
>>> Adding concurrency with deleting nodes is a bit more challenging but should
>>> still be doable.
>>>
>>> That said, I've only managed to think of an Erlang implementation that is
>>> capable of running contains() in parallel with an insert or delete. However,
>>> I cannot think of a way where I could have the server thread somehow
>>> distribute its updates if they are to nonconflicting branches. I feel like
>>> the solution should be a simple, recursive one, and that the answer is on
>>> the tip of my brain so to speak :) But I'm still baffled.
>>>
>>> My question is this; I can reason about shared memory objects and how
>>> concurrent accesses should be, either using locks or for the simpler cases
>>> using atomic instructions such as CAS. However, I'm finding it difficult to
>>> do that with Erlang/functional languages in general. I did a google search,
>>> couldn't really find any examples of what I'm looking for. Any tips or
>>> pointers?
>>>
>>> Here's what my code looks like (the whole thing is attached to this
>>> email). Apologies in advance for the quality of my code, still learning :)
>>>
>>> The basic idea, to be able to have contains() run in parallel with
>>> insertions and deletions, is that whenever the server gets a request to add
>>> something or remove something from the tree, its continue applying
>>> contains() to the old version until it receives the modifications back.
>>>
>>> The code for the tree itself is pretty much the same as that in
>>> Concurrent Programming in Erlang.
>>>
>>> % No changes pending for the tree
>>> parallelServer(Tree, stable, _) ->
>>>     receive
>>>         {Requester, insert, Key} ->
>>>             Self = self(),
>>>             spawn(fun() -> insertProcessor(Self, Key, Tree) end),
>>>             parallelServer(Tree, changing, Requester);
>>>
>>>         {Requester, delete, Key} ->
>>>             Self = self(),
>>>             spawn(fun() -> deleteProcessor(Self, Key, Tree) end),
>>>             parallelServer(Tree, changing, Requester);
>>>
>>>         {Requester, contains, Key} ->
>>>             spawn(fun() -> containsProcessor(Requester, Key, Tree) end),
>>>             parallelServer(Tree, stable, void);
>>>
>>>         {Requester, dump} ->
>>>             Requester ! Tree,
>>>             parallelServer(Tree, stable, void)
>>>     end;
>>>
>>> % Tree is being modified
>>> parallelServer(Tree, changing, Requester) ->
>>>     receive
>>>         {Requester, contains, Key} ->
>>>             spawn(fun() -> containsProcessor(Requester, Key, Tree) end),
>>>             parallelServer(Tree, changing, Requester);
>>>
>>>         {Requester, dump} ->
>>>             Requester ! Tree,
>>>             parallelServer(Tree, changing, Requester);
>>>
>>>         {update, NewTree} ->
>>>             Requester ! done,
>>>             parallelServer(NewTree, stable, void)
>>>     end.
>>>
>>>
>>> % insertProcessor and deleteProcessor cannot notify the requester
>>> directly to
>>> % ensure that parallelServer knows what the new tree is like first for
>>> % consistency.
>>>
>>> insertProcessor(Server, Key, Tree) ->
>>>     Server ! {update, insert(Key, Tree)}.
>>>
>>>
>>> deleteProcessor(Server, Key, Tree) ->
>>>     Server ! {update, delete(Key, Tree)}.
>>>
>>>
>>> containsProcessor(Requester, Key, Tree) ->
>>>     Requester ! contains(Key, Tree). % Can report it directly
>>>
>>>
>>>
>>> Thanks !
>>>
>>> Cheers,
>>> /Fuad
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>>
>>
>>
>>
>> --
>> --Hynek (Pichi) Vychodil
>
>
>


-- 
--Hynek (Pichi) Vychodil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080625/6c4c2f79/attachment.htm>


More information about the erlang-questions mailing list