[erlang-questions] Mnesia vs. timeouts

Thu Sep 2 10:49:02 CEST 2010

cupira <igorrs@REDACTED> wrote:
> Hello.
>
> I've recently had a problem with a pool of servers running Mnesia.
> The clients reach the pool through a load balancer and the tables
> (most of which are disc_only_copies) are fragmented and replicated
> throughout the servers. This means that Mnesia, on each server
> handling a request, will usually need to contact other servers.

This is a rather unfortunate setup that does not scale. In order to make
this to scale, you would need to try to achieve that each request only
should access local replicas. Mnesia has a concept of foreign keys
that can be used to co-allocate fragments from different tables.

> The problem happened when the number of requests from the clients was
> suddenly multiplied by 100 during some seconds. The kernel (system)
> CPU time immediately reached the top on every server and the clients
> began to timeout and later retry the failed requests (what worsened
> the problem, of course).
> From the logs in the servers, I noticed some dirty reads taking more
> than 10 minutes to finish (way after the clients had given up on those
> operations).
>
> My question is: how can I set a timeout for every Mnesia activity
> (which may be distributed) and make sure that, after that time, no
> operation related to that activity will be left hanging on any node?
> By just killing the process that called mnesia:activity, am I
> guaranteed to get that result?

Mnesia has no such timeout. I would not recommend killing the Mnesia
related processes, even if Mnesia is designed to cope with that. The core
problem seems to be that your application does not scale. Killing processes
in panic does not solve the real problem.

/Håkan