[erlang-questions] MNesia Distribution Questions

Wed Oct 21 16:49:46 CEST 2009

Hi Rob,

I guess the best I can do is pointing you to some useful resources:

- http://www.erlang.se/publications/mnesia_overview.pdf
- http://www.erlang.org/doc/apps/mnesia/Mnesia_chap1.html#1
- http://couchdb.apache.org/
- http://oreilly.com/catalog/9780596158163
- http://wiki.apache.org/couchdb/Frequently_asked_questions#why_no_mnesia

I advise you to give a quick look to these references and, if you  
still are in doubt, ask the mailing list.
Hope to be useful.

Best regards,

Roberto Aloi
roberto.aloi@REDACTED
http://www.erlang-consulting.com
---

On Oct 21, 2009, at 3:18 PM, Rob Stewart wrote:

> Hi there folks,
> Very quickly about me:
> - I'm doing a Masters in Software Engineering
> - I'm looking closely at Mnesia to involve it in my study
>
>
> OK, so here's where I'm at. It's quite early on in my report, but I'm
> getting a feel for what could be a useful investigation. I am briefly
> discussing cloud computing, and then in more detail at distributed
> computation and data storage (along with fault tolerance, high  
> availability
> etc...). My supervisor and I have agreed that to illustrate the use  
> of a
> distributed computation system, I am going to perform some large  
> computation
> on a large dataset, probably using Pig (dataflow language atop of  
> Hadoop). I
> have been given the university cluster to deploy Hadoop.
>
> (Bear with me....)
>
> So... now the big question for me right now is where to find a
> useful competitor (or rather a solution with similar goals). The  
> easy option
> would be to compare Pig to another Hadoop interface, i.e. Hive, but  
> those
> results would be pretty uninteresting). So instead, I'm looking into  
> the
> realm of distributed databases. Now... as far as I'm concerned, the  
> way in
> which Mnesia distributes the availability of data across  nodes is  
> similar
> to how Hadoop distributes data across the HDFS (Hadoop file system)  
> across
> nodes). My issue here is, my lack of understanding on how a data query
> computation is distributed over a network of Mnesia nodes. I have a  
> good
> understanding of how this is achieved with Hadoop (if there are 10
> datanodes, then each will get a tenth of the work), but is there  
> such a
> thing as parallel query processing with MNesia? Or... is MNesia just  
> a way
> to very very quickly replicate the availability of data.
>
> I hope that you guru's can shed some light on this for me. I'm not  
> aware of
> exactly how MNesia would deal with a data query where the MNesia  
> network
> consists of say, 10 nodes? Does a user query just one of the 10, or  
> does a
> user query the network? I'm really trying to think of a fair and  
> interesting
> way to compare the concept of a distributed database (MNesia),  
> against a
> distributed processing engine (Hadoop).
>
> There are other things I want to delve into also... For instance, I  
> really
> need to know more about the difference between CouchDB and MNesia.  
> So far, I
> can only establish that CouchDB is more useful for networks where  
> nodes are
> likely to go offline at various times. (Not much knowledge!!).
>
> If, however, comparing a distributed database engine against a  
> distributed
> processing engine is a non starter, let me know of that too !!
>
> Many thanks, I would really appreciate some feedback.
>
>
> Rob Stewart