[erlang-questions] "actor database" - architectural strategy question

Mon Feb 17 17:47:25 CET 2014

Well thanks, and there are some interesting ideas there - particularly 
re. addressing, but...

"A distributed SQL database with the scalability of a KV store."
and uses sqlite as the back end

Not quite what I'm looking for.  Not really a "database of actors" in 
the way that, say Gemstone is an "Object oriented database"

Sergej Jurecko wrote:
> http://www.actordb.com/
>
>
> Sergej
>
> On Feb 17, 2014, at 3:20 PM, Miles Fidelman wrote:
>
>> [Enough with the threads on Erlang angst for a while - time for some 
>> real questions :-) ]
>>
>> BACKGROUND:
>> A lot of what I do is systems engineering, and a lot of that ends up 
>> in the realm of technology assessment - picking the right platform 
>> and tools for a particular system.  My dablings in Erlang are largely 
>> in that category - I keep seeing it as potentially useful for a class 
>> of systems, keep experimenting with it, done a couple 
>> proof-of-concept efforts, but haven't built an operational system at 
>> scale with it (yet).  The focus, so far, has been in modeling and 
>> simulation (I first discovered Erlang when chasing R&D contracts for 
>> a firm that built simulation engines for military trainers.  I was 
>> flabbergasted to discover that everything was written in C++, every 
>> simulated entity was an object, with 4 main loops threading through 
>> every object, 20 times a second.  Talk about spaghetti code.  Coming 
>> from a data comm. protocol/network background - where we'd spawn a 
>> process for everything - I asked the obvious question, and was told 
>> that context switches would bring a 10,000 entity simulation to its 
>> knees.  My instinctual response was "bullshit" - and went digging 
>> into the technology for massive concurrency, and discovered Erlang.)
>>
>> Anyway....  For years, I've been finding myself in situations, and on 
>> projects, that have a common characteristic of linked documents that 
>> change a lot - in the general arena of planning and workflow. Lots of 
>> people, each editing different parts of different documents - with 
>> changes rippling through the collection.  Think linked spreadsheets, 
>> tiered project plans, multi-level engineering documents with lots of 
>> inter-dependencies.  To be more concrete: systems engineering 
>> documents, large proposals, business planning systems, command and 
>> control systems.
>>
>> Add in requirements for disconnected operation that lead to 
>> distribution/replication requirements rather than keeping single, 
>> central copies of things (as the librarians like to say, "Lots of 
>> Copies Keeps Stuff Safe").
>>
>> So far we've always taken conventional approaches - ranging from 
>> manual paper shuffling and xeroxing, to file servers with manual 
>> organization, to some of MS Office's document linking capabilities, 
>> to document databases and sharepoint.  And played with some XML 
>> database technologies.
>>
>> But.... I keep thinking that there are a set of underlying functions 
>> that beg for better tools - something like a distributed CVS that's 
>> optimized for planning documents rather than software (or perhaps 
>> something like a modernized Lotus Notes).
>>
>> And I keep thinking that the obvious architectural model is to treat 
>> each document (maybe each page) as an actor ("smart documents" if you 
>> will), with communication through publish-subscribe mechanisms. 
>> Interact with a (copy of) a document, changes get pushed to groups of 
>> documents via a pub-sub mechanism.  (Not unlike actor based 
>> simulation approaches.)
>>
>> And, of course, when I think actors, I think Erlang.  The obvious 
>> conceptualization is "every document is an actor."
>>
>> At which point an obvious question comes up:  How to handle long-term 
>> persistence, for large numbers of inactive entities.
>>
>> But... when I go looking for examples of systems that might be built 
>> this way, I keep finding that, even in Erlang-based systems, 
>> persistence is handled in fairly conventional ways:
>> - One might think that CouchDB treats every document as an actor, but 
>> think again
>> - Paulo Negri has given some great presentations on how Wooga 
>> implements large-scale social gaming - and they implement an actor 
>> per session - but when a user goes off-line they push state into a 
>> more conventional database  (then initialize a gen_server from the 
>> database, when the user comes back online)
>>
>> At which point the phrase "actor-oriented database" keeps coming back 
>> to mind, with the obvious analogy to "object-oriented databases." 
>>  I.e., something with the persistence and other characteristics of a 
>> database, where the contents are actors - with all the 
>> characteristics and functionality of those actors preserved while 
>> stored in the database.
>>
>> ON TO THE QUESTIONS:
>> I have a pretty good understanding of how one would build things like 
>> simulations, or protocol servers, with Erlang - not so much how one 
>> might build something with long-term persistence - which leads to 
>> some questions (some, probably naive):
>>
>> 1. So far, I haven't seen anything that actually looks like an 
>> "actor-oriented database."  Document databases implemented in Erlang, 
>> yes (e.g., CouchDB), but every example I find ultimately pushes 
>> persistent data into files or a more conventional database of some 
>> sort.  Can anybody point to an example of something that looks more 
>> like "storing actors in a database?"
>> - It strikes me that the core issues with doing so have to do with 
>> maintaining "aliveness" - i.e., dealing with addressability, routing 
>> messages to a stored actor, waking up after a timeout (i.e., the 
>> equivalent of triggers)
>>
>> 2. One obvious (if simplistic) thought: Does one really need to think 
>> in terms of a "database" at all - or might this problem be approached 
>> simply by creating each document as an Erlang process, and keeping it 
>> around forever?  Most of what I've seen built in Erlang focuses on 
>> relatively short-lived actors - I'd be really interested in comments on:
>> - limitations/issues in persisting 100s of 1000s, or maybe millions 
>> of actors, for extended periods of time (years, or decades)
>> - are there any tools/models for migrating (swapping?) inactive 
>> processes dynamically to/from disk storage
>>
>> 3. What about backup for the state of a process?  'Let it crash' is 
>> great for servers supporting a reliable protocol, not so great for an 
>> actor that has  internal state that has to be preserved (like a 
>> simulated tank, or a "smart document"). Pushing into a database is 
>> obvious, but...
>> - are there any good models for saving/restoring state within a tree 
>> of supervised processes?
>> - what about models for synchronizing state across replicated copies 
>> of processes running on different nodes?
>> - what about backup/restore of entire Erlang VMs (including anything 
>> that might be swapped out onto disk)
>>
>> 4. For communications between/among actors:  Erlang is obviously 
>> excellent for writing pub-sub engines (RabbitMQ and ejabberd come to 
>> mind), but what about pub-sub or multicast/broadcast models or 
>> messaging between Erlang processes?  Are there any good libraries for 
>> defining/managing process groups, and doing multicast or broadcast 
>> messaging to/among a group of processes.
>>
>> Thank you very much for any pointers or thoughts.
>>
>> Miles Fidelman
>>
>>
>>
>>
>> -- 
>> In theory, there is no difference between theory and practice.
>> In practice, there is.   .... Yogi Berra
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>> http://erlang.org/mailman/listinfo/erlang-questions
>

-- 
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra