mnesia cluster limits

Fri Oct 11 11:05:04 CEST 2002

Some time ago I wrote a HLR-like benchmark program that
scaled almost exactly linear from 2 to 16 machines.
(There were only 16 equivalent machines available.)
In that configuration I used fragmented tables
(some with foreign key) where each fragment had 2 ram_copies.

I did also successfully run the same program on 54 machines,
but since the hardware varied a lot I could not draw any
performance conclusions.

If you have 100 equivalent machines in a switched network,
it would be very interesting to hear about how Mnesia scales
in that environment.

My benchmark program is available in

   $ERL_TOP/mnesia/examples/bench

/Håkan

On Fri, 11 Oct 2002, Dan Gudmundsson wrote:

> Date: Fri, 11 Oct 2002 08:12:39 +0200
> From: Dan Gudmundsson <dgud@REDACTED>
> To: Hal Snyder <hal@REDACTED>
> Cc: erlang-questions@REDACTED
> Subject: mnesia cluster limits
> 
> 
> I havn't tested, why don't you do it and tell me :-) 
> 
> The problem is the startup where 100 nodes are trying to talk to each
> other, discussing what nodes should load which table from disc and
> which table should be from loaded from which node and so on...  
> 
> The number of messages between nodes during startup is something that
> I have to try to reduce sometime, it's currently working by method
> 'better to be safe than sorry'.
> 
> You may also find some bottlenecks in code, i.e. naive operations,
> keysearching through tagged tuples may not be the best when all lists
> become 100 elements, instead 3 elements.
> 
> Once every node have started and loaded all tables, there should not
> be any overhead on standard transactions, i.e. the overhead depends on
> the number of nodes which has a local copy of the table.
> 
> The problem will be schema_transactions, (create_table,
> add_table_copy, del_table_copy..) which takes lock on 
> the schema which exists on 100 nodes. 
> I think I can't do much about the schema_transactions.
> 
> /Dan
> 
> Hal Snyder writes:
>  > We have been arguing amongst ourselves as to the advisability of
>  > creating a cluster of between 20 and 100 OTP nodes, all running
>  > mnesia, all with the same erlang and mnesia cookies. Individual tables
>  > would be replicated sparsely, only where needed.
>  > 
>  > What is the overhead of a full mesh of n mnesia nodes, not counting
>  > replicated transactions? At what point does it become prohibitive?
>  > 
>  > 
>  > P.S.: I was just at the Erlang workshop at PLI2002, but foolishly
>  > forgot to ask the above question of the experts. The workshop
>  > was quite informative, and more fun than a barrel of monkeys.
>