[erlang-questions] "actor database" - architectural strategy question

Mon Feb 17 15:50:31 CET 2014

A document is a trace of events. These events records edits to the document
and when we play all of the events, we obtain the final document state.
Infinite undo is possible by looking back and replaying with a
point-in-time recovery option. An actor is a handler that can apply events
to a state in order to obtain a new state.

Events are persisted in an event log and WAL fashion. So even if the system
dies, we can replay its state safely. Once in a while, living processes
checkpoint their state to disk so they can boot up faster than having to
replay from day 0.

Multiple edits to the same document can be handled by operational transforms

http://en.wikipedia.org/wiki/Operational_transformation

Idle documents terminate themselves after a while by checkpointing
themselves to disk. Documents register themselves into gproc and if there
is no document present in gproc, you go to a manager and get it set up
either from disk or by forming a new document.

For easy storage, you can use a single table in a database for the log.

On Mon, Feb 17, 2014 at 3:20 PM, Miles Fidelman
<mfidelman@REDACTED>wrote:

> [Enough with the threads on Erlang angst for a while - time for some real
> questions :-) ]
>
> BACKGROUND:
> A lot of what I do is systems engineering, and a lot of that ends up in
> the realm of technology assessment - picking the right platform and tools
> for a particular system.  My dablings in Erlang are largely in that
> category - I keep seeing it as potentially useful for a class of systems,
> keep experimenting with it, done a couple proof-of-concept efforts, but
> haven't built an operational system at scale with it (yet).  The focus, so
> far, has been in modeling and simulation (I first discovered Erlang when
> chasing R&D contracts for a firm that built simulation engines for military
> trainers.  I was flabbergasted to discover that everything was written in
> C++, every simulated entity was an object, with 4 main loops threading
> through every object, 20 times a second.  Talk about spaghetti code.
>  Coming from a data comm. protocol/network background - where we'd spawn a
> process for everything - I asked the obvious question, and was told that
> context switches would bring a 10,000 entity simulation to its knees.  My
> instinctual response was "bullshit" - and went digging into the technology
> for massive concurrency, and discovered Erlang.)
>
> Anyway....  For years, I've been finding myself in situations, and on
> projects, that have a common characteristic of linked documents that change
> a lot - in the general arena of planning and workflow. Lots of people, each
> editing different parts of different documents - with changes rippling
> through the collection.  Think linked spreadsheets, tiered project plans,
> multi-level engineering documents with lots of inter-dependencies.  To be
> more concrete: systems engineering documents, large proposals, business
> planning systems, command and control systems.
>
> Add in requirements for disconnected operation that lead to
> distribution/replication requirements rather than keeping single, central
> copies of things (as the librarians like to say, "Lots of Copies Keeps
> Stuff Safe").
>
> So far we've always taken conventional approaches - ranging from manual
> paper shuffling and xeroxing, to file servers with manual organization, to
> some of MS Office's document linking capabilities, to document databases
> and sharepoint.  And played with some XML database technologies.
>
> But.... I keep thinking that there are a set of underlying functions that
> beg for better tools - something like a distributed CVS that's optimized
> for planning documents rather than software (or perhaps something like a
> modernized Lotus Notes).
>
> And I keep thinking that the obvious architectural model is to treat each
> document (maybe each page) as an actor ("smart documents" if you will),
> with communication through publish-subscribe mechanisms. Interact with a
> (copy of) a document, changes get pushed to groups of documents via a
> pub-sub mechanism.  (Not unlike actor based simulation approaches.)
>
> And, of course, when I think actors, I think Erlang.  The obvious
> conceptualization is "every document is an actor."
>
> At which point an obvious question comes up:  How to handle long-term
> persistence, for large numbers of inactive entities.
>
> But... when I go looking for examples of systems that might be built this
> way, I keep finding that, even in Erlang-based systems, persistence is
> handled in fairly conventional ways:
> - One might think that CouchDB treats every document as an actor, but
> think again
> - Paulo Negri has given some great presentations on how Wooga implements
> large-scale social gaming - and they implement an actor per session - but
> when a user goes off-line they push state into a more conventional database
>  (then initialize a gen_server from the database, when the user comes back
> online)
>
> At which point the phrase "actor-oriented database" keeps coming back to
> mind, with the obvious analogy to "object-oriented databases."  I.e.,
> something with the persistence and other characteristics of a database,
> where the contents are actors - with all the characteristics and
> functionality of those actors preserved while stored in the database.
>
> ON TO THE QUESTIONS:
> I have a pretty good understanding of how one would build things like
> simulations, or protocol servers, with Erlang - not so much how one might
> build something with long-term persistence - which leads to some questions
> (some, probably naive):
>
> 1. So far, I haven't seen anything that actually looks like an
> "actor-oriented database."  Document databases implemented in Erlang, yes
> (e.g., CouchDB), but every example I find ultimately pushes persistent data
> into files or a more conventional database of some sort.  Can anybody point
> to an example of something that looks more like "storing actors in a
> database?"
> - It strikes me that the core issues with doing so have to do with
> maintaining "aliveness" - i.e., dealing with addressability, routing
> messages to a stored actor, waking up after a timeout (i.e., the equivalent
> of triggers)
>
> 2. One obvious (if simplistic) thought: Does one really need to think in
> terms of a "database" at all - or might this problem be approached simply
> by creating each document as an Erlang process, and keeping it around
> forever?  Most of what I've seen built in Erlang focuses on relatively
> short-lived actors - I'd be really interested in comments on:
> - limitations/issues in persisting 100s of 1000s, or maybe millions of
> actors, for extended periods of time (years, or decades)
> - are there any tools/models for migrating (swapping?) inactive processes
> dynamically to/from disk storage
>
> 3. What about backup for the state of a process?  'Let it crash' is great
> for servers supporting a reliable protocol, not so great for an actor that
> has  internal state that has to be preserved (like a simulated tank, or a
> "smart document"). Pushing into a database is obvious, but...
> - are there any good models for saving/restoring state within a tree of
> supervised processes?
> - what about models for synchronizing state across replicated copies of
> processes running on different nodes?
> - what about backup/restore of entire Erlang VMs (including anything that
> might be swapped out onto disk)
>
> 4. For communications between/among actors:  Erlang is obviously excellent
> for writing pub-sub engines (RabbitMQ and ejabberd come to mind), but what
> about pub-sub or multicast/broadcast models or messaging between Erlang
> processes?  Are there any good libraries for defining/managing process
> groups, and doing multicast or broadcast messaging to/among a group of
> processes.
>
> Thank you very much for any pointers or thoughts.
>
> Miles Fidelman
>
>
>
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.   .... Yogi Berra
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>

-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140217/8eba1ac3/attachment.htm>