[erlang-questions] "Erlang plus BDB: Disrupting the Conventional Web Wisdom"

Ulf Wiger ulf@REDACTED
Thu Oct 11 22:25:18 CEST 2007


2007/10/11, Chris Newcombe <chris.newcombe@REDACTED>:
>
>  I've read several reports that mnesia doesn't
> handle network partitioning very well at all (to the point where it is
> best to disable auto-reconnection of Erlang nodes that are running
> mnesia), presumably because it was designed for telecoms switches
> where network partitioning is relatively rare.

I think mnesia does offer some decent support for resolving partitioned
networks. Disabling the auto-connect feature isn't mainly to keep
mnesia from messing up - the application will automatically heal as
well, and perhaps start writing to the database while the state of the
data is still undecided. There are also other players: global will
automatically resynch and start deconflicting names, and its default
method of deconflicting is to pick one candidate at random and slay
it (other methods can be specified per-name). Before alternative methods
became available, I had some particularly memorable moments when
the (globally registered) process in charge of trying to resolve the
situation of partioned network was brutally murdered by global at
the worst possible time. (:  This, of course, had nothing to do with
mnesia.

We have also seen, on rare occasions, where particularly bad
network errors have caused the node supervision heartbeat to
time out between some nodes, but not others, making it
extremely difficult to figure out even what parts of the system
are still functioning.

So disabling the automatic reconnect means that you have a fighting
chance of finding out what happened, before the software tries to
organically heal itself. Erlang's auto-connect semantics and distributed
nature are wonderful, most of the time, but can be a real pain in
certain situations. And there is no solution to partitioned networks
that works well in all situations.

One could imagine a few configurable strategies that mnesia could
employ automatically if partitioned network occurs...

BR,
Ulf W



More information about the erlang-questions mailing list