[erlang-questions] State Management Problem

Sat Dec 19 10:35:52 CET 2015

On 2015年12月19日 土曜日 07:29:48 you wrote:
> I think, that makes sense. We have to make trade-offs to make a system
> practical.
> 
> Though, the idea that I was pointing to is that, just like FLP proof
> <https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf> states that
> there is no way to know if a process has really failed or the messages are
> just delayed but Erlang provides a practical and working way to deal with
> it. Similarly, can a language or standard libraries like OTP give us
> practical ways to achieve trade-offs among Consistency, Availability and
> Partition Tolerance? I feel that the same problem (the CAP trade-off) is
> being solved in each system separately.
> 
> As far as I understand, Joe Armstrong, in his thesis, argues that a
> language can provide constructs to achieve reliability and that's how
> Erlang came into picture. I wonder whether CAP trade-offs can also be
> exposed using some standard set of libraries/language.

Keep two (and a half things) in mind.

1. Joe was addressing a specific type of fault tolerance. There are many other kinds than transient bugs. This approach does nothing to magically fix hardware failures, for example.

2. The original idea does not appear to have been focused on massively distributed applications, but rather on making concurrent processes be the logical unit of computation instead of memory-shared objects. (Hence "concurrency oriented programming" instead of "object oriented programming".)

2.5 This was done indepenently of "the actor model". Today that's a nice buzzword, and Erlang is very nearly the actor model, but that wasn't the point.

This was achieved by forcing a large number of concurrency tradeoffs onto the programmer from the start, tradeoffs that are normally left up to the programmer: how to queue messages; what "message sequence" means (and doesn't mean); whether there is a global "tic" all processes can refer to; if shared memory, message queues, and member function calls are all a big mixed bag of signaling possibilities or not; how semaphore/lock centric code should be; whether messaging was fundamentally asynchronous or synchronous; how underlying resources are scheduled; etc.

The tradeoffs that were made do not suit every use case -- but it turns out that most of the tradeoffs suit a surprisingly broad variety of (most?) programming needs. TBH, a lot of the magical good that is in Erlang seems to have been incidental. (Actually, most of the good things that are in the world at all seem to have been incidental.)

Point 2.5 is interesting to me personally because I had very nearly written a crappier version of OTP myself in (mostly) Python based around system processes and sockets to support a large project before realizing I could instead simplify things dramatically by just using Erlang. Joe's idea and Carl Hewitt's "Actor Model" were developed independently of one another but take a *very* similar approach -- one in the interest of real-world practicality, the other in the interest of developing an academically pure model of computation. This indicates to me that the basic idea of process-based computation lies very close to an underlying fundamental truth about information constructs. That I felt compelled to architect a system in a very similar manner after running into several dead-ends with regard to both concurrency-imposed complexity *and* fault tolerance (in ignorance of both OTP and the Actor Model) only cements this idea in my mind for firmly.

But none of this addresses distributed state.

There are a million aspects here that have to be calibrated to a particular project before you can even have a meaningful idea what "distributed state" means in any practical sense. This is because most of the state in most distributed systems don't matter to the *entire* system at any given moment. For example, in a game server do we care what the "global state" is? Ever? Of course not. We only care that enough related state is consistent enough that gameplay meets expectations. Here we consciously choose to accept some forms of inconsistency in return for lower latency (higher availability). If one node crashes we'll have to restart it but it shouldn't affect the rest of the system. Let me back up and make this example more concrete, because games today often represent an abnormally complete microcosm (and many of them are designed poorly, but there is a lot to learn from various missteps in games).

We have mobs, players, items, chat services, financial services, a user identity service, a related online forum, a purchasing/information website, a web interface for game ranking and character stats, the map, and a concept of "live zones".

Mobs are basically non-essential in that they will respawn on a timer if they get killed or crash but their respawn state is always known.

Player characters are non-essential, but in a different way: their restart *state* must be transactionally guaranteed -- they can't just lose all their items or score or levels or whatever when they die or if a managing process crashes.

Item trade *must* be given transactional guarantees, otherwise duplicates can occur in trade or if two players pick up an item "at the same time", duplicate references can occur, or items might disappear from the game entirely (a worse outcome of the same bug -- parts of the game may break in fundamental ways).

Chat is global, but totally asynch. Its also non-critical. If clients see messages chat works. If they don't its down, or nobody is chatting. Gameplay is utterly unaffected.

Finance requires firm transactional guarantees (and a whole universe-worth of political comedy).

User identities are really *account* identities, and so storage requires transactional guarantees, but login status requires a separate approach (stateless, authentication/verification based or whatever).

Online forums are forums. This is a solved problem, right? Sure, maybe. But in what ways has it been "solved"? This is itself a whole separate world of approaches, backends, frontends, authentication bridges (because this must tie into the existing user identity system), etc.

Web interface for stats, scores and ranks appears similar to the forums issue at first, but the data it will draw from and display is *completely* game related. At what point should this update? How closely should changes in the game be reflected here? Is the overhead worth making it "live"? Are players the ones who check their own pages, or do others? Etc. A whole world of totally different tradeoffs apply here -- and they may be critical to the community model that drives the core game business. (ouch!)

The base map data itself is usually static and globally known -- to the point that every game client has a copy of it; awesome for performance but means map updates are *game client* updates. Anything special that is occurring in a map zone at a given moment is a function of map data overlay local to a zone (typically) -- and that can change on the fly, but only temporarily while that zone's host node is alive.

The map is divided into zones, and each zone is tied to its host node. If a node goes down that "region" is inaccessible for the moment (and everything in it will crash and need to respawn elsewhere), but but isolates faults by region which makes reasoning about it for players and developers fairly easy.

...And so on, and so on, et cetera, et cetera...

Think through how distributed state works in each of these cases. There are tons of cases, and each one requires different tradeoffs. How would you go about writing a *generic* library that would provide a useful abstraction of *all* of those cases (and all the ones I left out -- game servers are complicated!).

Much of this is not language-specific, or even framework specific. Because programming at that level is all about what the data is doing while it is alive. In fact, I argue (and *do* argue this, passionately at times) that while the basic infrastructure of the system is usually best handled in Erlang, Erlang is *not* the best tool for *every* part of the system. This is also true of data storage.

Most of the issues involving distributed state in a system cross the barrier between hardware and software, and are trickier to solve in generic ways. That's why hardcore data people still talk in terms of spindles and read VS write latency instead of just generally talking about iops (unless they are stuck in a cloud service, in which case their game service is nearly guaranteed to fail for lack of the remotest clue what their SLA means in terms of actually trying to play the game). That's why when you buy EMC or NetApp services you have to configure the system to provide the sort of performance tradeoffs that fit your system instead of just buying them like they were refrigerators. And this is just the hardware side of things. Trying to write a generic system to handle distributed state in software would require knowing quite a bit about the hardware or else choices would be made totally in the blind.

The result is that OTP handles state in terms of "safe state to recover and restart", and that's about it. The rest is up to you. Outside of Erlang we have database systems that have chosen various tradeoffs already, and we can pick from them where necessary. Some systems take the "transactions or bust, latency be damned" approach, other databases make the "conflicts will happen, or they won't and messages will be received, or they won't, but who cares, because PERFORMANCE!!!" approach, etc.

Its not just that nobody can pick the perfect CAP tradeoff for all cases. Even more fundamentally, there is no single database system that is a perfect solution for all types of data. Sure, you can emulate any type of data in a full-service RDBMS, but sometimes its better to have your canonical, normalized storage in Postgres but be asking questions to a graphing database that is optimized for graph queries, or making text search queries to a system optimized for that, or "schemaless" document databases that don't really provide guarantees of any sort (so you check yourself in other code) but are super fast and easy to understand for some specific case (like youtube discussion comments that aren't of critical importance to humanity). This difference is what underlies thinking in data warehousing and archive retrieval (when people actually think, that is).

If we can't even pick a "generic database" then its impossible to imagine that any language runtime could come batteries included with a "generic distributed state management framework". That would be a real trick indeed. If you happen to figure out how to swing that, give me a call -- I'll help you implement it and we'll be super famous, if not fabulously wealthy.

Just to give yourself some brain food -- try describing what the API or interface to such a system would even look like. If that's hard to even imagine then the problem is usually more fundamental to the problem domain.

-Craig