[erlang-questions] Concurrency Orientated Design question

Mon Feb 7 23:36:54 CET 2011

On Mon, Feb 7, 2011 at 22:20, Marcel Meyer <marcel.meyer@REDACTED> wrote:
> Haha, I was actually wondering if each node could be a process, but some
> trees have 100,000's of nodes, so that didn't seem right?

Think in B-trees. Each node has a "page" of depth 8, say and up to 256
Pid links to the next nodes in the tree. This divides down the number
of active processes you need but also increases the price of updating
slightly. I've never tried to build something like this by the way.

> By scenario 3 do you mean to store the hierarchical data as a flat list?

Yes and no. I meant to store a node ini ETS as:

{Identifier, NodeData}

where Identifier is something make_ref() returns to me. And NodeData
is a list [{file, FileMeta} | {deferred_directory, Identifier}]. The
price is that you are copying these to the process doing lookups and
this of course has a price, especially since we throw out data right
afterwards. But the advantage is that this has really good parallel
access properties. Again, you can toy with the idea of storing larger
chunks of data as terms at the expense of more costly updates. The
idea is that we have the deferred_directory as a continuation for our
data, should we wish to walk further down that path in the hierarchy,
but we don't pay up front for the price of loading it. Essentially it
is a lazy load.

> I've done some analysis on this, and even with graph optimization like
> pre-ordered tree traversal, traversing the actual tree was still faster.
> There are some business scenarios for example where access has been removed
> for a user on a node, and access to lower down nodes have to be inferred, so
> having it as a proper graph is handy.

I just wanted to make sure you had considered these options as well.
Personally, I'd probably try the paging model.

-- 
J.