[erlang-questions] Massive Numbers of Actors vs. Massive Numbers of Objects vs. ????

Thu Mar 1 16:53:01 CET 2012

Robert Melton wrote:
> On Wed, Feb 29, 2012 at 6:04 PM, Miles Fidelman
> <mfidelman@REDACTED>  wrote:
>> I guess I'm thinking that today's emails often contain HTML and JavaScript -
>> i.e., they're no longer "lifeless."  Concatenating them into mbox files no
>> longer seems like a good to deal with them.  Seems like each message is
>> better modeled as either an object or an actor, or some hybrid.
> Wait, is the plan to actually execute the javascript in the context of
> Erlang (like Spidermonkey in Riak?)... if so, then I guess you could
> treat them as active things -- else, I need to go with "lifeless" --
> they are opaque text data that has no active component, regardless if
> javascript, python, perl, php or X in that text document.

Short answer is yes.

Longer answer:

I'm thinking through the details of a mail database to replace mbox 
files or mh-style directories - where, for example, a message (say a 
meeting agenda) - buried away in the database, can wake up and remind me 
that my slides are overdue for distribution, or that the meeting is in 
two hours.

Yes, there are lots of ways to do this, but one that strikes me as 
conceptually nice is simply to drop that message into <something> with 
the following characteristics:
- allows for storing/browsing/retrieving messages through a classic 
directory/folder style hierarchy through a mail client
- allows stored messages to wake up and do things
- allows "follow-up" messages to be passed directly to stored items

One obvious model is to mediate everything through a database engine of 
some sort.  Another is to treat each message as a first-class, 
addressable, process that's "parked."  I find this model somewhat 
interesting - so I figure it's worth playing out.  The conceptual model 
I keep coming back to is a bookshelf, where each book is active - able 
to flash a light to tell me to go look at it, able to receive update 
messages, able to propagate notes that I make in the margins, .....  The 
question becomes, what does the underlying infrastructure look like.

>> Simple case: I distribute a document (say an article).  I then want to
>> distribute an update that gets auto-applied, rather than leaving it to the
>> recipient to "replace paragraph x with<this text>".
> Owner control would be much easier with a reference to a root document
> than to distribute updates to thousands of copies.

I'm a big fan of the LOCKSS model (lots of copies keeps stuff safe).  
Particularly important in situations where connectivity is not 
guaranteed, or where where individual copies can get destroyed (say 
military operations).

>> Particularly useful when dealing with formal documents - e.g. project plans
>> - that have a formal update process associated with them.  Email copies of
>> documents that contain their own update rules (who can send updates, how
>> they get applied, conflict resolution rules).  Then email updates that get
>> passed to the original document, which updates itself.  (Not that different
>> than emailing software patches.)
> My team is currently working on a similar problem (but part of the
> core domain-space, not accidental complexity created by us)... if you
> ever design a rules system that can do this well, release it and I
> will buy you a beer the next time you are in DC.  Seriously,
> distributed conflict resolution and surfacing is hard stuff (tm).
> Obviously, all this complexity would because you want the clients to
> support local customizations of the document, and merge / control the
> parts they want.  Again, if you can create a decent meta-language for
> this type of stuff, I am seriously interested.

We should talk offline about this - how always looking for collaborators.

>> The wrangler concept seems conceptually clumsy - compared to, say, objects
>> and triggers stored in an object database.  What's the actor-oriented
>> equivalent?  (Actually, there is a middle ground - document databases - and
>> CouchDB comes to mind.)
> I don't see why it is clumsy, it is the standard process / data
> pairing that we know and love.  In actual use, I think it is very
> elegant, because you only ever deal with is the wranglers, all the
> magic of storing the documents behind them is hidden by them -- you
> get a simple understandable "living" interface.  Even the complexity
> of scheduling events happens behind the wranglers ... the wranglers
> are your API touch point.  Behind a system of wranglers, I would
> probably use a document store of some sort (insert your favorite
> flavor of K/V store).

I'm a protocol guy by background.  Put the intelligence at the 
endpoints, let them talk to each other.  Minimize the role of everything 
in the middle.  (If I were designing Facebook, I'd start with NNTP, not 
a central server.  Neglecting business reasons for a centralized model, 
of course.).

I'm trying to keep my focus on the objects involved, and their behaviors 
- and keep coming back to actors as the fundamental construct, as 
opposed to classic objects (though classic might not be the right word, 
I believe the actor formalism was coined before that of object orientation).

>> See, I'm coming to the opposite conclusion.  Folks in the object-oriented
>> world came up with object-oriented databases as a way to persist huge
>> numbers of objects.  But my problem with object-oriented models is that they
>> really don't deal with flow-of-control (particularly when dealing with
>> massively concurrent problems).  And lots of things are better modeled as
>> long-lived actors, rather than objects.
> You can persist massive numbers of "things" in any standard database
> (see: Mysql / Facebook).  OO DBs were not created to solve a "scale"
> problem with storing objects, they were created to solve the ORIM
> (object-relational impedance mismatch).  They created as many problems
> as they solved in most cases, and every team I know of that used one,
> ended up regretting it in due time.  I think that is part of the
> reason you hear more about document&  k/v stores than object
> persistence frameworks.  I consider OO DBs to be one of the incredible
> failures of the OO community... even worse than the nasty ORMs they
> were created to replace, at least the ORMs didn't have horrific
> lock-in, just bitter painful annoying lock-in.

Well yeah, there is that.  But the question of long-term persistence 
remains, and I can see a need for long-term persistence of large numbers 
of both object-like "things" and actor-like "things"  (as well as 
key-value pairs and other tupple-like "things").

>> I come at this after spending some years in the simulation world.  I arrived
>> at a company that made, among other things, "computer generate forces"
>> simulators - doing things like simulating 10,000 people, vehicles, and
>> weapons moving around a battlespace...
>>
>> Seems to me that lots of things (e.g., tanks) are better modeled as actors
>> than as objects - particularly if you have an environment like Erlang that
>> supports massive concurrency.  But... that leads to the question of what to
>> do with those entities when they're not doing anything - where do you "park"
>> a tank?
> These simulations run for massive amounts of time?  It seems that in
> most simulations you are looking for unexpected emergent behaviors --
> don't the actors have to be alive and responding the events happening
> in there reactive space to do this?  You know what, there are enough
> implementation details in how to handle this that I am going to skip
> ahead...

The ones I've been involved with are more for training purposes - 
networked wargames that contain a mix of real people/equipment and 
simulated ones.  They can run for several weeks, and very often you want 
to re-run a scenario under different conditions.  And then there are 
MMORPGs - where virtual worlds persist for very long periods of time.

>> If I'm parking objects, an object database is the obvious answer.
> ... Seems obvious, but often isn't.  OO DBs have lots of issues (as
> listed before) and even with OQL still end up being more trouble than
> they are worth.  Lots of people still store billions of objects to
> (favorite RDBMS here) via ORMs -- as the tooling support, 3rd party
> integration, etc, etc, etc is better.  There is a reason in the last
> 20 years OO DBs didn't take over the world.  They are still niche.
>
>
>>   If I'm parking an actor, the answer is less obvious - I can have it
>> hibernate, but that doesn't persist across crashes, reboots, etc. - and I
>> eventually run out of PIDs.
> Seems fairly obvious to me, have the actor persist its state somewhere
> (K/V store, document store, even RDBMS or heck, an OODB if you really
> want to), and have it load it back when it is looked up... end up with
> a 1:1 actor (process) to data system.

Seems like an unnecessary step that adds confusion to one's conceptual 
model.

For example: Conceptually, actors seem like the right framework modeling 
a tank (with either a human or software in the driver's seat) - and 
there are times when you want to "park tank no. 5 in the motor pool."  I 
really don't want to think about things like "change the tank from an 
actor to a database record when it's not doing anything - I just want it 
to sit there, "parked" if you will.

Erlang and OTP go way beyond anything out there when it comes to 
managing huge numbers of actors, when they're doing something.  And 
hibernation is a reasonable way to view an actor that's inactive for a 
long period of time - but current mechanisms don't seem to scale all 
that well.  I'm raising the question of what to do if you want to 
persist inactive actors over extended periods of time.

>> Which leads me to the thought that an actor-oriented database would be very
>> useful for large simulations, gaming platforms, .. and the kind of "active
>> emails" that I'm thinking about.  CouchDB might be a good platform for
>> things that are document-like, but I'm getting intrigued by the more general
>> case (a "parking lot" for actors, if you will).
> Once again, seems fairly "solved"... almost every Erlang system I have
> built had this setup in some form or another ... process persists
> itself via some mechanism.  It sounds like you want like a
> super-hibernate that moves state to some persistence layer and keeps
> the process running in an ultra minimal state... should probably be
> fairly easy to build what you want, you should give it a go and report
> back.

Exactly the conclusion I've come to.  Before building it, I figure it's 
worth seeing what other people think about the need, approach, and 
whether anybody else has built (pieces of) such a beast already.

"Super-hibernate" is a great way of putting it.  Not so sure it would be 
all that easy to build without digging into the VM.  The other choice is 
to decouple PID from the identity of super-hibernating processes - which 
would probably require recreating a whole bunch of functionality that 
Erlang already provides for handling processes and messaging.  (Now if 
someone has already done some of that.....)

Cheers,

Miles

-- 
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra