[erlang-questions] "actor database" - architectural strategy question

Mon Feb 17 17:30:19 CET 2014

http://www.actordb.com/

Sergej

On Feb 17, 2014, at 3:20 PM, Miles Fidelman wrote:

> [Enough with the threads on Erlang angst for a while - time for some real questions :-) ]
> 
> BACKGROUND:
> A lot of what I do is systems engineering, and a lot of that ends up in the realm of technology assessment - picking the right platform and tools for a particular system.  My dablings in Erlang are largely in that category - I keep seeing it as potentially useful for a class of systems, keep experimenting with it, done a couple proof-of-concept efforts, but haven't built an operational system at scale with it (yet).  The focus, so far, has been in modeling and simulation (I first discovered Erlang when chasing R&D contracts for a firm that built simulation engines for military trainers.  I was flabbergasted to discover that everything was written in C++, every simulated entity was an object, with 4 main loops threading through every object, 20 times a second.  Talk about spaghetti code.  Coming from a data comm. protocol/network background - where we'd spawn a process for everything - I asked the obvious question, and was told that context switches would bring a 10,000 entity simulation to its knees.  My instinctual response was "bullshit" - and went digging into the technology for massive concurrency, and discovered Erlang.)
> 
> Anyway....  For years, I've been finding myself in situations, and on projects, that have a common characteristic of linked documents that change a lot - in the general arena of planning and workflow. Lots of people, each editing different parts of different documents - with changes rippling through the collection.  Think linked spreadsheets, tiered project plans, multi-level engineering documents with lots of inter-dependencies.  To be more concrete: systems engineering documents, large proposals, business planning systems, command and control systems.
> 
> Add in requirements for disconnected operation that lead to distribution/replication requirements rather than keeping single, central copies of things (as the librarians like to say, "Lots of Copies Keeps Stuff Safe").
> 
> So far we've always taken conventional approaches - ranging from manual paper shuffling and xeroxing, to file servers with manual organization, to some of MS Office's document linking capabilities, to document databases and sharepoint.  And played with some XML database technologies.
> 
> But.... I keep thinking that there are a set of underlying functions that beg for better tools - something like a distributed CVS that's optimized for planning documents rather than software (or perhaps something like a modernized Lotus Notes).
> 
> And I keep thinking that the obvious architectural model is to treat each document (maybe each page) as an actor ("smart documents" if you will), with communication through publish-subscribe mechanisms. Interact with a (copy of) a document, changes get pushed to groups of documents via a pub-sub mechanism.  (Not unlike actor based simulation approaches.)
> 
> And, of course, when I think actors, I think Erlang.  The obvious conceptualization is "every document is an actor."
> 
> At which point an obvious question comes up:  How to handle long-term persistence, for large numbers of inactive entities.
> 
> But... when I go looking for examples of systems that might be built this way, I keep finding that, even in Erlang-based systems, persistence is handled in fairly conventional ways:
> - One might think that CouchDB treats every document as an actor, but think again
> - Paulo Negri has given some great presentations on how Wooga implements large-scale social gaming - and they implement an actor per session - but when a user goes off-line they push state into a more conventional database  (then initialize a gen_server from the database, when the user comes back online)
> 
> At which point the phrase "actor-oriented database" keeps coming back to mind, with the obvious analogy to "object-oriented databases."  I.e., something with the persistence and other characteristics of a database, where the contents are actors - with all the characteristics and functionality of those actors preserved while stored in the database.
> 
> ON TO THE QUESTIONS:
> I have a pretty good understanding of how one would build things like simulations, or protocol servers, with Erlang - not so much how one might build something with long-term persistence - which leads to some questions (some, probably naive):
> 
> 1. So far, I haven't seen anything that actually looks like an "actor-oriented database."  Document databases implemented in Erlang, yes (e.g., CouchDB), but every example I find ultimately pushes persistent data into files or a more conventional database of some sort.  Can anybody point to an example of something that looks more like "storing actors in a database?"
> - It strikes me that the core issues with doing so have to do with maintaining "aliveness" - i.e., dealing with addressability, routing messages to a stored actor, waking up after a timeout (i.e., the equivalent of triggers)
> 
> 2. One obvious (if simplistic) thought: Does one really need to think in terms of a "database" at all - or might this problem be approached simply by creating each document as an Erlang process, and keeping it around forever?  Most of what I've seen built in Erlang focuses on relatively short-lived actors - I'd be really interested in comments on:
> - limitations/issues in persisting 100s of 1000s, or maybe millions of actors, for extended periods of time (years, or decades)
> - are there any tools/models for migrating (swapping?) inactive processes dynamically to/from disk storage
> 
> 3. What about backup for the state of a process?  'Let it crash' is great for servers supporting a reliable protocol, not so great for an actor that has  internal state that has to be preserved (like a simulated tank, or a "smart document"). Pushing into a database is obvious, but...
> - are there any good models for saving/restoring state within a tree of supervised processes?
> - what about models for synchronizing state across replicated copies of processes running on different nodes?
> - what about backup/restore of entire Erlang VMs (including anything that might be swapped out onto disk)
> 
> 4. For communications between/among actors:  Erlang is obviously excellent for writing pub-sub engines (RabbitMQ and ejabberd come to mind), but what about pub-sub or multicast/broadcast models or messaging between Erlang processes?  Are there any good libraries for defining/managing process groups, and doing multicast or broadcast messaging to/among a group of processes.
> 
> Thank you very much for any pointers or thoughts.
> 
> Miles Fidelman
> 
> 
> 
> 
> -- 
> In theory, there is no difference between theory and practice.
> In practice, there is.   .... Yogi Berra
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140217/00f4b5d7/attachment.htm>