[erlang-questions] "actor database" - architectural strategy question
Miles Fidelman
mfidelman@REDACTED
Mon Feb 17 17:47:25 CET 2014
Well thanks, and there are some interesting ideas there - particularly
re. addressing, but...
"A distributed SQL database with the scalability of a KV store."
and uses sqlite as the back end
Not quite what I'm looking for. Not really a "database of actors" in
the way that, say Gemstone is an "Object oriented database"
Sergej Jurecko wrote:
> http://www.actordb.com/
>
>
> Sergej
>
> On Feb 17, 2014, at 3:20 PM, Miles Fidelman wrote:
>
>> [Enough with the threads on Erlang angst for a while - time for some
>> real questions :-) ]
>>
>> BACKGROUND:
>> A lot of what I do is systems engineering, and a lot of that ends up
>> in the realm of technology assessment - picking the right platform
>> and tools for a particular system. My dablings in Erlang are largely
>> in that category - I keep seeing it as potentially useful for a class
>> of systems, keep experimenting with it, done a couple
>> proof-of-concept efforts, but haven't built an operational system at
>> scale with it (yet). The focus, so far, has been in modeling and
>> simulation (I first discovered Erlang when chasing R&D contracts for
>> a firm that built simulation engines for military trainers. I was
>> flabbergasted to discover that everything was written in C++, every
>> simulated entity was an object, with 4 main loops threading through
>> every object, 20 times a second. Talk about spaghetti code. Coming
>> from a data comm. protocol/network background - where we'd spawn a
>> process for everything - I asked the obvious question, and was told
>> that context switches would bring a 10,000 entity simulation to its
>> knees. My instinctual response was "bullshit" - and went digging
>> into the technology for massive concurrency, and discovered Erlang.)
>>
>> Anyway.... For years, I've been finding myself in situations, and on
>> projects, that have a common characteristic of linked documents that
>> change a lot - in the general arena of planning and workflow. Lots of
>> people, each editing different parts of different documents - with
>> changes rippling through the collection. Think linked spreadsheets,
>> tiered project plans, multi-level engineering documents with lots of
>> inter-dependencies. To be more concrete: systems engineering
>> documents, large proposals, business planning systems, command and
>> control systems.
>>
>> Add in requirements for disconnected operation that lead to
>> distribution/replication requirements rather than keeping single,
>> central copies of things (as the librarians like to say, "Lots of
>> Copies Keeps Stuff Safe").
>>
>> So far we've always taken conventional approaches - ranging from
>> manual paper shuffling and xeroxing, to file servers with manual
>> organization, to some of MS Office's document linking capabilities,
>> to document databases and sharepoint. And played with some XML
>> database technologies.
>>
>> But.... I keep thinking that there are a set of underlying functions
>> that beg for better tools - something like a distributed CVS that's
>> optimized for planning documents rather than software (or perhaps
>> something like a modernized Lotus Notes).
>>
>> And I keep thinking that the obvious architectural model is to treat
>> each document (maybe each page) as an actor ("smart documents" if you
>> will), with communication through publish-subscribe mechanisms.
>> Interact with a (copy of) a document, changes get pushed to groups of
>> documents via a pub-sub mechanism. (Not unlike actor based
>> simulation approaches.)
>>
>> And, of course, when I think actors, I think Erlang. The obvious
>> conceptualization is "every document is an actor."
>>
>> At which point an obvious question comes up: How to handle long-term
>> persistence, for large numbers of inactive entities.
>>
>> But... when I go looking for examples of systems that might be built
>> this way, I keep finding that, even in Erlang-based systems,
>> persistence is handled in fairly conventional ways:
>> - One might think that CouchDB treats every document as an actor, but
>> think again
>> - Paulo Negri has given some great presentations on how Wooga
>> implements large-scale social gaming - and they implement an actor
>> per session - but when a user goes off-line they push state into a
>> more conventional database (then initialize a gen_server from the
>> database, when the user comes back online)
>>
>> At which point the phrase "actor-oriented database" keeps coming back
>> to mind, with the obvious analogy to "object-oriented databases."
>> I.e., something with the persistence and other characteristics of a
>> database, where the contents are actors - with all the
>> characteristics and functionality of those actors preserved while
>> stored in the database.
>>
>> ON TO THE QUESTIONS:
>> I have a pretty good understanding of how one would build things like
>> simulations, or protocol servers, with Erlang - not so much how one
>> might build something with long-term persistence - which leads to
>> some questions (some, probably naive):
>>
>> 1. So far, I haven't seen anything that actually looks like an
>> "actor-oriented database." Document databases implemented in Erlang,
>> yes (e.g., CouchDB), but every example I find ultimately pushes
>> persistent data into files or a more conventional database of some
>> sort. Can anybody point to an example of something that looks more
>> like "storing actors in a database?"
>> - It strikes me that the core issues with doing so have to do with
>> maintaining "aliveness" - i.e., dealing with addressability, routing
>> messages to a stored actor, waking up after a timeout (i.e., the
>> equivalent of triggers)
>>
>> 2. One obvious (if simplistic) thought: Does one really need to think
>> in terms of a "database" at all - or might this problem be approached
>> simply by creating each document as an Erlang process, and keeping it
>> around forever? Most of what I've seen built in Erlang focuses on
>> relatively short-lived actors - I'd be really interested in comments on:
>> - limitations/issues in persisting 100s of 1000s, or maybe millions
>> of actors, for extended periods of time (years, or decades)
>> - are there any tools/models for migrating (swapping?) inactive
>> processes dynamically to/from disk storage
>>
>> 3. What about backup for the state of a process? 'Let it crash' is
>> great for servers supporting a reliable protocol, not so great for an
>> actor that has internal state that has to be preserved (like a
>> simulated tank, or a "smart document"). Pushing into a database is
>> obvious, but...
>> - are there any good models for saving/restoring state within a tree
>> of supervised processes?
>> - what about models for synchronizing state across replicated copies
>> of processes running on different nodes?
>> - what about backup/restore of entire Erlang VMs (including anything
>> that might be swapped out onto disk)
>>
>> 4. For communications between/among actors: Erlang is obviously
>> excellent for writing pub-sub engines (RabbitMQ and ejabberd come to
>> mind), but what about pub-sub or multicast/broadcast models or
>> messaging between Erlang processes? Are there any good libraries for
>> defining/managing process groups, and doing multicast or broadcast
>> messaging to/among a group of processes.
>>
>> Thank you very much for any pointers or thoughts.
>>
>> Miles Fidelman
>>
>>
>>
>>
>> --
>> In theory, there is no difference between theory and practice.
>> In practice, there is. .... Yogi Berra
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>> http://erlang.org/mailman/listinfo/erlang-questions
>
--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra
More information about the erlang-questions
mailing list