[erlang-questions] Massive Numbers of Actors vs. Massive Numbers of Objects vs. ????

Thu Mar 1 00:04:03 CET 2012

Robert Melton wrote:
> Hey Miles!  Interesting topic.

Hey Rob, Thanks!  It's fun to bat around (and useful).
>
> On Tue, Feb 28, 2012 at 12:03 PM, Miles Fidelman
> <mfidelman@REDACTED>  wrote:
>> Think of something like massive numbers of stored email messages, where each
>> message is addressable, can respond to events, and in some cases can
>> initiate events -
> I think this is a poor way to structure the problem.  An "email" is a
> static lifeless thing that can be acted on by processes.  I would
> consider an email-message something that can be serialized down to the
> disk, it is data.  What you need is a number of email_wrangler
> processes, each one tied to a single piece of data, your "email".

I guess I'm thinking that today's emails often contain HTML and 
JavaScript - i.e., they're no longer "lifeless."  Concatenating them 
into mbox files no longer seems like a good to deal with them.  Seems 
like each message is better modeled as either an object or an actor, or 
some hybrid.

>> for example, on receiving an email message, a reader can
>> fill in a form, and have that update every copy of the message spread across
>> dozens (or hundreds, or thousands) of mailboxes/folders distributed across
>> the Internet.
> Please explain the use case, I can't think of any use case that would
> demand this structure.  If the idea is to have a public-state (shared
> among everyone who got the email) and a private state (unique to that
> user) -- I think it would make way more sense to separate the storage
> of these two concerns and apply updates respectively (parent-child
> style).  If the idea is that all state is shared, why copy?  If the
> idea is that all state is private, why message?

Simple case: I distribute a document (say an article).  I then want to 
distribute an update that gets auto-applied, rather than leaving it to 
the recipient to "replace paragraph x with <this text>".

Particularly useful when dealing with formal documents - e.g. project 
plans - that have a formal update process associated with them.  Email 
copies of documents that contain their own update rules (who can send 
updates, how they get applied, conflict resolution rules).  Then email 
updates that get passed to the original document, which updates itself.  
(Not that different than emailing software patches.)

>> Or where an email message, stored in some folder, can wake up
>> and send a reminder.
> The email_wrangler could do this, or a email_scheduler could start the
> email_wrangler when it is time to do that work... which is how I would
> probably structure it, not reason to leave email_wranglers running.

The wrangler concept seems conceptually clumsy - compared to, say, 
objects and triggers stored in an object database.  What's the 
actor-oriented equivalent?  (Actually, there is a middle ground - 
document databases - and CouchDB comes to mind.)

> I am currently building a system with millions of concurrent processes
> across a cluster of Erlang nodes.  But, these processes aren't left up
> just rotting, they are spawned in reaction to events, and live for a
> responsible and useful time period in which they do work.  I can not
> fathom why you would DESIGN a system to have huge numbers of
> persistent inactive actors.
>
>
>> In some sense, I'm describing an "actor-oriented database" - a place to park
>> large numbers of persistent actors, surrounded by mechanisms to deliver
>> messages, and allow them to wake up after timeouts.
>>
>> I'm kind of surprised somebody hasn't built such a beast - at least as a
>> research experiment.
> I would never see a use for such a beast when everything it
> accomplishes can be done easier using existing idioms with less memory
> and CPU usage... unless I entirely missed something.

See, I'm coming to the opposite conclusion.  Folks in the 
object-oriented world came up with object-oriented databases as a way to 
persist huge numbers of objects.  But my problem with object-oriented 
models is that they really don't deal with flow-of-control (particularly 
when dealing with massively concurrent problems).  And lots of things 
are better modeled as long-lived actors, rather than objects.

I come at this after spending some years in the simulation world.  I 
arrived at a company that made, among other things, "computer generate 
forces" simulators - doing things like simulating 10,000 people, 
vehicles, and weapons moving around a battlespace (think massively 
multi-player game).  My first guess was that each entity was modeled as 
a light-weight thread, but all of our c++ programmers assured me that, 
no, you can't do that, context switching overhead kills you.  Instead 
everything was modeled as an object, and four main threads would touch 
every object once per simulation cycle (40 times a second or so).  
Conceptually ugly, and lots of spaghetti code.  I discovered Erlang when 
looking for other ways to do things.

Seems to me that lots of things (e.g., tanks) are better modeled as 
actors than as objects - particularly if you have an environment like 
Erlang that supports massive concurrency.  But... that leads to the 
question of what to do with those entities when they're not doing 
anything - where do you "park" a tank?  If I'm parking objects, an 
object database is the obvious answer.  If I'm parking an actor, the 
answer is less obvious - I can have it hibernate, but that doesn't 
persist across crashes, reboots, etc. - and I eventually run out of PIDs.

Which leads me to the thought that an actor-oriented database would be 
very useful for large simulations, gaming platforms, .. and the kind of 
"active emails" that I'm thinking about.  CouchDB might be a good 
platform for things that are document-like, but I'm getting intrigued by 
the more general case (a "parking lot" for actors, if you will).

Miles

-- 
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra