[erlang-questions] "actor database" - architectural strategy question

Tue Feb 18 00:19:43 CET 2014

On Mon, Feb 17, 2014 at 7:04 PM, Miles Fidelman <mfidelman@REDACTED>
wrote:
>
> If I wanted to model this as a standard database, or serializing state
into
> a traditional database, I wouldn't be asking the questions I asked. Can
> anybody talk to the questions I actually asked, about:
> - handling large numbers of actors that might persist for years, or
decades
> (where actor = Erlang-style process)
> - backup up/restoring state of long-running actors that might crash
> - multi-cast messaging among actors

Hi,

Some time ago I was part of a team which created software to manage phone
number migration between mobile operators. Say you want to change your cell
phone provider (mandatory in EU and in many other countries). We were the
entity responsible for that process.

One portability request is one process. At any time we could have had up to
1M processes (practically it was much lower, but we used this number when
designing the system). A "portability process" is a finite state machine
with communication in SOAP between two or three parties with many internal
state changes.

A single process could last from a few hours up to few months (median ~3-4
days), each state up to 10-100KB of uncompressed text (mean ~15KB
uncompressed).

Having Erlang processes allowed very nice things like straightforward
programming of state transitions during timeouts.

Strict consistency requirements meant we had checkpoints in a key-value
store for every operation for every process, which was managed globally.
>From that checkpoints it was possible to re-create state replying all
actions.

We did not really manage to fully implement a proper addressing mechanism
for non-volatile message sending. We invented our own PIDs which had some
sort of address / node ownership information. The mechanism was complex and
imperfect, nothing really to learn from. AMQP might be a good candidate
though.

Note that some of the details above are not exactly true (esp. numbers),
because I can't remember all the details.

A few remarks:
1. Do *not* store full state after you change it. Implement a diff
mechanism on your abstract state tree (it's strictly defined, right?), test
it using PropEr and use that. If you require fast recovery in case of
crash, checkpoint is ok, but never drop the old state. You might dispute
the state transition after months, go fix the bug and want to re-run a
particular process transitions again next year... Ugh.
2. Long-lived processes (weeks+) are perfectly fine for Erlang VM. Just
make sure to hibernate them after some minutes of inactivity. You can
easily have hundreds of thousands, consume basically no CPU and just enough
memory to keep the internal state.

Regards,
Motiejus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140218/f3d04d50/attachment.htm>