Mnesia vs. timeouts

Thu Sep 2 07:24:32 CEST 2010

Hello.

I've recently had a problem with a pool of servers running Mnesia.
The clients reach the pool through a load balancer and the tables
(most of which are disc_only_copies) are fragmented and replicated
throughout the servers. This means that Mnesia, on each server
handling a request, will usually need to contact other servers.

The problem happened when the number of requests from the clients was
suddenly multiplied by 100 during some seconds. The kernel (system)
CPU time immediately reached the top on every server and the clients
began to timeout and later retry the failed requests (what worsened
the problem, of course).
>From the logs in the servers, I noticed some dirty reads taking more
than 10 minutes to finish (way after the clients had given up on those
operations).

My question is: how can I set a timeout for every Mnesia activity
(which may be distributed) and make sure that, after that time, no
operation related to that activity will be left hanging on any node?
By just killing the process that called mnesia:activity, am I
guaranteed to get that result?

Thanks.
Igor.

-- 
"The secret of joy in work is contained in one word - excellence. To
know how to do something well is to enjoy it." - Pearl S. Buck.