[erlang-questions] Maximum number of Mnesia nodes

Hakan Mattsson hakan@REDACTED
Tue Jul 31 10:14:47 CEST 2007



On Mon, 30 Jul 2007, denis wrote:

> Date: Mon, 30 Jul 2007 13:48:31 -0400
> From: denis <dloutrein.lists@REDACTED>
> To: 'Hakan Mattsson' <hakan@REDACTED>
> Cc: 'David King' <dking@REDACTED>, 'Joel Reymont' <joelr1@REDACTED>,
>     'Erlang Questions' <erlang-questions@REDACTED>
> Subject: RE: [erlang-questions] Maximum number of Mnesia nodes
> 
> 
> 
> > -----Message d'origine-----
> > De : Hakan Mattsson [mailto:hakan@REDACTED]
> > Envoyé : lundi 30 juillet 2007 11:26
> > À : denis
> > Cc : 'David King'; 'Joel Reymont'; 'Erlang Questions'
> > Objet : RE: [erlang-questions] Maximum number of Mnesia nodes
> > 
> > 
> > If you replicate the tables to all nodes, the
> > performance of updates will be worse for each new node
> > that you add. I think that the performance characteristics
> > of such a transaction will aproximately follow this
> > formula: C + N * P. Where N is the number of nodes, P
> > is the work performed for each (remote) transaction
> > participant and C is the (local) transaction coordinator
> > work that is independent of the number of nodes.
> > 
> 
> I thought that the transaction time increase was less
> than that.  Isn't the replication done in parallel? I
> mean, if we have 3 nodes, and we do an insert, for
> instance, the insert is forwarded to each node at the
> same time, and the coordinator wait for the
> confirmation of each node to validate the
> transaction. In this case, the formula would be
> approximately:
>      C + max(P)

Yes you are right, much of the work that is performed
on byhalf of the participants are done in parallel on
the remote nodes. But parts of the work that the
coordinator performs is also dependent of how many
paricipants that are involved. 

So if we split P into the the coordinator work that is
dependent of the number of participants (CP) and the
work (PP) that is done remotely by the participant
itself we can come up with a more precise formula:
C + N * CP + max(PP).

> > But do you really need to replicate the data to all nodes?
> 
> In fact, the load is dispatched by a load balancer
> which dispatches requests to different servers. That's
> why we have one mnesia instance per server, with tables
> replicated. If one server crash, the others servers
> have all data to take the load of the crashed one.The
> problem with this architecture is if we add to many
> servers, we increase the load of mnesia replication.
> I'll try to find if we can use fragments in our case,
> or rethink the architecture.

Good luck!

/Håkan
 
> Denis
> 
> 
> > Even if you have relatively few records in your database,
> > fragmented tables can be very useful in order to distribute
> > the load over many nodes.
> > 
> > If you can identify some type of record in your
> > application that is being accessed in the majority of
> > your transactions, it is a good candidate for fragmented
> > table storage. This could be a bank account, a subscriber,
> > a session etc. etc. You can also co-locate records in other
> > tables with your main record (see "foreign_key" in Mnesia).
> > 
> > When your application needs to access such a record it
> > should determine one of the replica nodes for the table
> > and run the transaction on that node. If this can be
> > achieved for all transactions there will always be a
> > fixed number of (2-3?) nodes  involved in each transaction.
> > That is the (2?) nodes where the table is replicated plus
> > one of the nodes from where  transactions are forwarded.
> > If you distribute the fragments over more nodes it will
> > scale smoothly.
> > 
> > Of course it is not possible to achieve this for all
> > types of applications. But you should strive for that
> > kind of access patterns in order to achieve better
> > scalability.
> > 
> > /Håkan
> > 
> > On Fri, 27 Jul 2007, denis wrote:
> > 
> > > Date: Fri, 27 Jul 2007 14:57:14 -0400
> > > From: denis <dloutrein.lists@REDACTED>
> > > To: 'Hakan Mattsson' <hakan@REDACTED>
> > > Cc: 'David King' <dking@REDACTED>, 'Joel Reymont'
> > <joelr1@REDACTED>,
> > >     'Erlang Questions' <erlang-questions@REDACTED>
> > > Subject: RE: [erlang-questions] Maximum number of Mnesia nodes
> > >
> > > Thanks Hakan for your response.
> > >
> > > If I understand well, fragmented tables are interesting when we have
> > high
> > > volume of data. In my case, that's not the case, around 100000 records
> > on 5
> > > tables.
> > > I plan to have one mnesia instance per server (and one server per
> > machine),
> > > each having each table in ram_copies replicated with the others servers.
> > > Each server uses his local mnesia instance (maybe that's not the better
> > > architecture?)
> > >
> > > My concern is when for instance I do a delete or an insert into a table.
> > The
> > > transaction succeeds only when the insert or delete are done on each
> > > replicated table. If I have only one server, the transaction time will
> > be
> > > for instance 10ms. If I have two server replicated, will it be 2*10ms ?
> > For
> > > N servers, what kind of factor can I expect? N, log(n), exp(N) ...?
> > > I'm not sure that I can run 20 servers for instance, and keep good
> > > performance, depending on the response time of the transaction
> > commitment on
> > > each node.
> > >
> > > Thanks
> > > Denis
> > >
> > >
> > > > -----Message d'origine-----
> > > > De : Hakan Mattsson [mailto:hakan@REDACTED]
> > > > Envoyé : vendredi 27 juillet 2007 13:09
> > > > À : denis
> > > > Cc : 'David King'; 'Joel Reymont'; 'Erlang Questions'
> > > > Objet : Re: [erlang-questions] Maximum number of Mnesia nodes
> > > >
> > > >
> > > > The scalability of Mnesia depends heavily of your
> > > > access patterns and how you have configured Mnesia.
> > > >
> > > > If you ensure that the number of nodes involved in a
> > > > typical transaction is constant, Mnesia should scale
> > > > very well. One way of achieving linear scalability
> > > > characteristics, is to use the concept called
> > > > "foreign_key" in the chapter about fragmented
> > > > tables. The bench example (mnesia/examples/bench)
> > > > utilizes this technique.
> > > >
> > > > When I wrote the "bench" benchmark example it turned
> > > > out to scale almost perfectly linear. (By distributing
> > > > the Mnesia tables over twice as many computers, the
> > > > number of processed transactions per second also
> > > > doubled.) But by that time I only had access to 10 (or
> > > > was it 16?) identical computers, so I cannot say
> > > > anything about how Mnesia scales beyond that. Worth to
> > > > mentition is that I also did successfully run the bench
> > > > example with fragmented tables distributed over all our
> > > > machines at the office (50+). But as those computers
> > > > had so different characteristics, it is impossible to
> > > > say anything about the scalability. It was fun that it
> > > > worked though.
> > > >
> > > > Chandru, do you still have the highscore of the number
> > > > of Mnesia nodes in a production environment?
> > > >
> > > > /Håkan
> > > >
> > > > On Fri, 27 Jul 2007, denis wrote:
> > > >
> > > > > Date: Fri, 27 Jul 2007 11:15:18 -0400
> > > > > From: denis <dloutrein.lists@REDACTED>
> > > > > To: 'David King' <dking@REDACTED>, 'Joel Reymont'
> > > > <joelr1@REDACTED>
> > > > > Cc: 'Erlang Questions' <erlang-questions@REDACTED>
> > > > > Subject: Re: [erlang-questions] Maximum number of Mnesia nodes
> > > > >
> > > > > Still nobody have a response for this?
> > > > >
> > > > > I'm in the case of designing a server embedding mnesia with
> > ram_copies
> > > > > tables. Several instances of the server can be launched, and the
> > mnesia
> > > > > tables are replicated.
> > > > > I'm having the same concern with how many instances I can run before
> > the
> > > > > transaction committing in mnesia becomes a problem.
> > > > >
> > > > > If someone already used several replicated mnesia instances, I would
> > > > like to
> > > > > have some numbers.
> > > > >
> > > > > Thanks
> > > > > Denis
> > > > >
> > > > > > -----Message d'origine-----
> > > > > > De : erlang-questions-bounces@REDACTED [mailto:erlang-questions-
> > > > > > bounces@REDACTED] De la part de David King
> > > > > > Envoyé : lundi 23 juillet 2007 21:24
> > > > > > À : Joel Reymont
> > > > > > Cc : Erlang Questions
> > > > > > Objet : Re: [erlang-questions] Maximum number of Mnesia nodes
> > > > > >
> > > > > > Did you ever get any off-list responses to this? I'm curious too.
> > > > > >
> > > > > > On 15 Jul 2007, at 05:44, Joel Reymont wrote:
> > > > > >
> > > > > > > Folks,
> > > > > > >
> > > > > > > How many Mnesia nodes are you running in your production
> > > > > > > installation? I'm looking to find the maximum here.
> > > > > > >
> > > > > > > I'm only dealing with ram_copies tables (cache), trying to
> > figure
> > > > out
> > > > > > > whether I can make every Yaws node a Mnesia node without slowing
> > > > > > > transactions down too much.
> > > > > > >
> > > > > > > My current thinking is to wait and gather statistics before
> > trying
> > > > to
> > > > > > > decouple Yaws and Mnesia . Still, I would love to know how much
> > > > > > > transactions slow down with the addition of every new Mnesia
> > node.
> > > > > > >
> > > > > > > 	Thanks, Joel
> > > > > > >
> > > > > > > --
> > > > > > > http://topdog.cc      - EasyLanguage to C# compiler
> > > > > > > http://wagerlabs.com  - Blog


More information about the erlang-questions mailing list