[erlang-questions] Scalaris Questions

Mon Jul 28 14:08:33 CEST 2008

Hi Rudolph,

On Saturday 26 July 2008, Rudolph van Graan wrote:
> Hi Thorsten,
>
> I've had a quick look at Scalaris - It really looks like it may solve
> a lot of problems. Nice work!
>
> Some questions:
>
> 1. I've looked through the code and it seems that the final storage
> uses gb_trees (in cs_db_otp). Obviously this means that the current
> version does not support persistence? Or did I not see it?
That is correct.
> 2. If this is the case, how would you suggest getting persistence into
> Scalaris? For me this would be first priority - would you implement
> this using mnesia, i.e. disk tables? But with a schema per Scalaris
> node?
A first try could/should indeed use mnesia.
> 3. Would it be sufficient to abstract out cs_db_otp and replace it
> with say cs_db_mnesia or cs_db_dets? Or would you suggest using a log-
> based scheme?
I was thinking of adding a "db" behaviour similar to the routingtable scheme. 
We currently have two routingtable implementations (rt_simple and rt_chord) 
and you can switch between them by changing ?RT in chordsharp.hrl. We can do 
something similar for the db.

However, there is still one piece missing. Let's say your server crashed 
because of power outage or something but all your data is persistent on the 
disk. When you restart the server, you have to recreate the scalaris nodes 
running in the vm. There can be several (admin:add_nodes/1). I guess that it 
will be sufficient to store the id (see cs_keyholder) of each running node.

Do you know any log-based Erlang dbs? It could better very interesting for 
implementing snapshot algorithms. We would be interested in placing a marker 
into the db and later on grab a snapshot of the database with the version as 
given by the marker.

> I have looked through the slides/videos presented in London, and a
> couple of questions popped up:
>
> 1. As far as I can see you loaded the entire English Wikipedia into
> 320 + 20 Erlang nodes? So the entire content was distributed into
> memory using gb_trees?
We ran the demonstrations with the Bavarian resp. Simple English db. The real 
English is larger. All requests were served out of memory/the gb_trees.

> 2. How did the performance of this implementation compare with the
> real Wikipedia compensating for the number of machines etc?
At around ~15 servers we matched the wikipedia performance. When going beyond 
20 servers, the load generator became the bottleneck.
> 3. In terms of the Java interface for your Wikipedia demo, did you use
> the OtpNode class for communication or only the Term Libraries with a
> custom transport?
It is based on an OtpConnection, see 
scalaris/java-api/src/de/zib/chordsharp
in the svn (http://code.google.com/p/scalaris).

Thorsten