[erlang-questions] "actor database" - architectural strategy question

Tue Feb 18 14:57:29 CET 2014

On 02/18, Miles Fidelman wrote:
> 
> For scaling, the next step would be to swap out to disk - if that happens as
> a hibernate to disk, all the other functionality would stick around (as
> opposed to an application level write-state-to-disk).  Hence the question
> re. internals of the hibernate BIF and the scheduler, and how one might
> think about wiring in a hibernate-to-disk function.
> 

Be aware of the implicit state that exists within an application:
pending requests, messages in a mailbox, links and monitors being set
up, code versions being in use, sockets and file descriptors, etc.

These things can be backed up, but reloading them does not promise a
workable process with an equivalent environment upon reloading. You
either have to write code that can deal with that, or prevent code that
cannot deal with it from being written.

Nevermind the really hard problem of finding consistent snapshots if
your actors interact together. For example, if I have processes A, B, C
communicating with each other and influencing each other, I likely want
to make sure that my snapshots of A, B, and C yield a snapshot that will
load them all back up in a sane manner.

This means that if A sent a message to B, then I have to make sure my
snapshots of A and B are done both after A sent the message and after
received (or ideally: handled) it, unless I'm ready to reload state with
B possessing information A doesn't know about, even if it sent it. This
may lead to duplicated messages on reception, or just be similar to
state corruption.

Again, it's a really, really hard problem (especially if processes crash
while you're trying to get things snapshotted!), and there are algorithm
to work around them. For a quick intro, see:

- http://en.wikipedia.org/wiki/Snapshot_algorithm for a basic one, where
you can possibly already spot a bunch of issues with trying to make it
work.
- http://www.cs.mcgill.ca/~lli22/575/mattern93efficient.pdf for more
  algorithms and explanations
- 'consistent cut' is the term used for all algorithms related to
  finding that snapshotting sweet spot. Googling around for 'consistent
  cut' will yield plenty of good intro material.

My personal opinion is that this stuff is really hard to get right. I
hope for your sake that your processes do not interact with each other
too much :)

Regards,
Fred.