[erlang-questions] Distrobuted/Fragmented Mnesia node crash

Tue Oct 9 15:58:01 CEST 2007

What errors do you get?

We know there is a lurking bug somewhere in the dets code.
We have got 'bad object' and 'premature eof' every other
month the last year. We have not been able to track the
bug down since the dets files is repaired automatically
next time it is opened.

As a temporary solution we restart mnesia whenever this error occurs...

Btw: it would be very nice if you had the time to write a little
trapexit-tutorial describing your fragmented setup.

Cheers, Tobbe

Eranga Udesh wrote:
> Hi,
> 
> I run 3 distributed Erlang nodes, dedicated to run Mnesia database server.
> Another Erlang node runs the application which read/write to those tables.
> Schema is created as disk_copies in all 4 nodes and tables are created as
> fragmented and distributed tables between 3 dedicated Mnesia nodes. I.e. 99
> fragments of a table distributed in 3 nodes resulting 33 fragments in each.
> There're about 250-350 Db write/s and 500-800 DB read/s
> 
> Performance is quite good. I don't see "Mnesia overloaded" warnings. However
> occasionally one of the Mnesia node crashes. Sometimes the time between 2
> crashes of the same Mnesia node may be about 1-2 days, while the next crash
> takes about 10-20 days to happen. It doesn't even generate the crash_dump
> file or even if its generated, the size is 0 bytes.
> 
> Any idea what could be the cause? Are others also experience the same? I
> don't see any abnormal Load, Processor or Memory usage when these crashes
> occurs.
> 
> BRgds,
> - Eranga
> 
>