[erlang-questions] MNesia replication in ec2

Tue Apr 8 23:20:39 CEST 2014

On EC2 you will inevitably experience network conditions that will cause mnesia to decide that it is partitioned. There is no way to perform an online recovery from a partition event[1] without stopping and restarting the mnesia application. Unless you design your app to go into a degraded operation mode where all mnesia operations are either avoided or errors caught, then you'll experience a lot of trouble with it.

Additionally, mnesia recovers by copying the entire table from another node - this is fine(ish) for small datasets, but if your dataset grows over time, you'll eventually find yourself unable to bring new/recovering nodes online quickly which blows up your Mean Time To Repair.

I think multi-data-center Riak sounds like a better fit for multi-region ec2 work honestly, but I have not personally used Riak in that scenario so I defer to others.

-Geoff

[1] Pretty sure that's still true?

On 2014-04-08, at 14:14 , Ryan Brown <ryankbrown@REDACTED> wrote:

> We are in the process of laying the groundwork for deploying our application to ec2.  We currently depend upon mnesia replication for a small amount of shared metadata in our datacenter. I've read a number of posts/articles that have given me pause when thinking about how mnesia will work across AZs, regions in ec2. Primarily regarding network partitioning. We are currently considering removing our use of mnesia and pushing the small amount of data we replicate to shared store (possibly Riak).
> 
> Does anybody out there have experience with this or guidance?
> 
> -- 
> -rb