[erlang-questions] MNesia Distribution Questions
Wed Oct 21 16:49:46 CEST 2009
I guess the best I can do is pointing you to some useful resources:
I advise you to give a quick look to these references and, if you
still are in doubt, ask the mailing list.
Hope to be useful.
On Oct 21, 2009, at 3:18 PM, Rob Stewart wrote:
> Hi there folks,
> Very quickly about me:
> - I'm doing a Masters in Software Engineering
> - I'm looking closely at Mnesia to involve it in my study
> OK, so here's where I'm at. It's quite early on in my report, but I'm
> getting a feel for what could be a useful investigation. I am briefly
> discussing cloud computing, and then in more detail at distributed
> computation and data storage (along with fault tolerance, high
> etc...). My supervisor and I have agreed that to illustrate the use
> of a
> distributed computation system, I am going to perform some large
> on a large dataset, probably using Pig (dataflow language atop of
> Hadoop). I
> have been given the university cluster to deploy Hadoop.
> (Bear with me....)
> So... now the big question for me right now is where to find a
> useful competitor (or rather a solution with similar goals). The
> easy option
> would be to compare Pig to another Hadoop interface, i.e. Hive, but
> results would be pretty uninteresting). So instead, I'm looking into
> realm of distributed databases. Now... as far as I'm concerned, the
> way in
> which Mnesia distributes the availability of data across nodes is
> to how Hadoop distributes data across the HDFS (Hadoop file system)
> nodes). My issue here is, my lack of understanding on how a data query
> computation is distributed over a network of Mnesia nodes. I have a
> understanding of how this is achieved with Hadoop (if there are 10
> datanodes, then each will get a tenth of the work), but is there
> such a
> thing as parallel query processing with MNesia? Or... is MNesia just
> a way
> to very very quickly replicate the availability of data.
> I hope that you guru's can shed some light on this for me. I'm not
> aware of
> exactly how MNesia would deal with a data query where the MNesia
> consists of say, 10 nodes? Does a user query just one of the 10, or
> does a
> user query the network? I'm really trying to think of a fair and
> way to compare the concept of a distributed database (MNesia),
> against a
> distributed processing engine (Hadoop).
> There are other things I want to delve into also... For instance, I
> need to know more about the difference between CouchDB and MNesia.
> So far, I
> can only establish that CouchDB is more useful for networks where
> nodes are
> likely to go offline at various times. (Not much knowledge!!).
> If, however, comparing a distributed database engine against a
> processing engine is a non starter, let me know of that too !!
> Many thanks, I would really appreciate some feedback.
> Rob Stewart
More information about the erlang-questions