[erlang-questions] State Management Problem

Mon Dec 21 14:11:59 CET 2015

I think I see your point. I will definitely think about it using all the
knowledge that I have. Databases do solve the problem but I think that for
real time applications, going to a database would be slow. The part to
think about for me, would be to come up with a useful API. Thanks for your
inputs.

Aman

On Sat, Dec 19, 2015 at 4:36 AM zxq9 <zxq9@REDACTED> wrote:

> On 2015年12月19日 土曜日 07:29:48 you wrote:
> > I think, that makes sense. We have to make trade-offs to make a system
> > practical.
> >
> > Though, the idea that I was pointing to is that, just like FLP proof
> > <https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf> states that
> > there is no way to know if a process has really failed or the messages
> are
> > just delayed but Erlang provides a practical and working way to deal with
> > it. Similarly, can a language or standard libraries like OTP give us
> > practical ways to achieve trade-offs among Consistency, Availability and
> > Partition Tolerance? I feel that the same problem (the CAP trade-off) is
> > being solved in each system separately.
> >
> > As far as I understand, Joe Armstrong, in his thesis, argues that a
> > language can provide constructs to achieve reliability and that's how
> > Erlang came into picture. I wonder whether CAP trade-offs can also be
> > exposed using some standard set of libraries/language.
>
> Keep two (and a half things) in mind.
>
> 1. Joe was addressing a specific type of fault tolerance. There are many
> other kinds than transient bugs. This approach does nothing to magically
> fix hardware failures, for example.
>
> 2. The original idea does not appear to have been focused on massively
> distributed applications, but rather on making concurrent processes be the
> logical unit of computation instead of memory-shared objects. (Hence
> "concurrency oriented programming" instead of "object oriented
> programming".)
>
> 2.5 This was done indepenently of "the actor model". Today that's a nice
> buzzword, and Erlang is very nearly the actor model, but that wasn't the
> point.
>
> This was achieved by forcing a large number of concurrency tradeoffs onto
> the programmer from the start, tradeoffs that are normally left up to the
> programmer: how to queue messages; what "message sequence" means (and
> doesn't mean); whether there is a global "tic" all processes can refer to;
> if shared memory, message queues, and member function calls are all a big
> mixed bag of signaling possibilities or not; how semaphore/lock centric
> code should be; whether messaging was fundamentally asynchronous or
> synchronous; how underlying resources are scheduled; etc.
>
> The tradeoffs that were made do not suit every use case -- but it turns
> out that most of the tradeoffs suit a surprisingly broad variety of (most?)
> programming needs. TBH, a lot of the magical good that is in Erlang seems
> to have been incidental. (Actually, most of the good things that are in the
> world at all seem to have been incidental.)
>
> Point 2.5 is interesting to me personally because I had very nearly
> written a crappier version of OTP myself in (mostly) Python based around
> system processes and sockets to support a large project before realizing I
> could instead simplify things dramatically by just using Erlang. Joe's idea
> and Carl Hewitt's "Actor Model" were developed independently of one another
> but take a *very* similar approach -- one in the interest of real-world
> practicality, the other in the interest of developing an academically pure
> model of computation. This indicates to me that the basic idea of
> process-based computation lies very close to an underlying fundamental
> truth about information constructs. That I felt compelled to architect a
> system in a very similar manner after running into several dead-ends with
> regard to both concurrency-imposed complexity *and* fault tolerance (in
> ignorance of both OTP and the Actor Model) only cements this idea in my
> mind for firmly.
>
> But none of this addresses distributed state.
>
> There are a million aspects here that have to be calibrated to a
> particular project before you can even have a meaningful idea what
> "distributed state" means in any practical sense. This is because most of
> the state in most distributed systems don't matter to the *entire* system
> at any given moment. For example, in a game server do we care what the
> "global state" is? Ever? Of course not. We only care that enough related
> state is consistent enough that gameplay meets expectations. Here we
> consciously choose to accept some forms of inconsistency in return for
> lower latency (higher availability). If one node crashes we'll have to
> restart it but it shouldn't affect the rest of the system. Let me back up
> and make this example more concrete, because games today often represent an
> abnormally complete microcosm (and many of them are designed poorly, but
> there is a lot to learn from various missteps in games).
>
> We have mobs, players, items, chat services, financial services, a user
> identity service, a related online forum, a purchasing/information website,
> a web interface for game ranking and character stats, the map, and a
> concept of "live zones".
>
> Mobs are basically non-essential in that they will respawn on a timer if
> they get killed or crash but their respawn state is always known.
>
> Player characters are non-essential, but in a different way: their restart
> *state* must be transactionally guaranteed -- they can't just lose all
> their items or score or levels or whatever when they die or if a managing
> process crashes.
>
> Item trade *must* be given transactional guarantees, otherwise duplicates
> can occur in trade or if two players pick up an item "at the same time",
> duplicate references can occur, or items might disappear from the game
> entirely (a worse outcome of the same bug -- parts of the game may break in
> fundamental ways).
>
> Chat is global, but totally asynch. Its also non-critical. If clients see
> messages chat works. If they don't its down, or nobody is chatting.
> Gameplay is utterly unaffected.
>
> Finance requires firm transactional guarantees (and a whole universe-worth
> of political comedy).
>
> User identities are really *account* identities, and so storage requires
> transactional guarantees, but login status requires a separate approach
> (stateless, authentication/verification based or whatever).
>
> Online forums are forums. This is a solved problem, right? Sure, maybe.
> But in what ways has it been "solved"? This is itself a whole separate
> world of approaches, backends, frontends, authentication bridges (because
> this must tie into the existing user identity system), etc.
>
> Web interface for stats, scores and ranks appears similar to the forums
> issue at first, but the data it will draw from and display is *completely*
> game related. At what point should this update? How closely should changes
> in the game be reflected here? Is the overhead worth making it "live"? Are
> players the ones who check their own pages, or do others? Etc. A whole
> world of totally different tradeoffs apply here -- and they may be critical
> to the community model that drives the core game business. (ouch!)
>
> The base map data itself is usually static and globally known -- to the
> point that every game client has a copy of it; awesome for performance but
> means map updates are *game client* updates. Anything special that is
> occurring in a map zone at a given moment is a function of map data overlay
> local to a zone (typically) -- and that can change on the fly, but only
> temporarily while that zone's host node is alive.
>
> The map is divided into zones, and each zone is tied to its host node. If
> a node goes down that "region" is inaccessible for the moment (and
> everything in it will crash and need to respawn elsewhere), but but
> isolates faults by region which makes reasoning about it for players and
> developers fairly easy.
>
> ...And so on, and so on, et cetera, et cetera...
>
> Think through how distributed state works in each of these cases. There
> are tons of cases, and each one requires different tradeoffs. How would you
> go about writing a *generic* library that would provide a useful
> abstraction of *all* of those cases (and all the ones I left out -- game
> servers are complicated!).
>
> Much of this is not language-specific, or even framework specific. Because
> programming at that level is all about what the data is doing while it is
> alive. In fact, I argue (and *do* argue this, passionately at times) that
> while the basic infrastructure of the system is usually best handled in
> Erlang, Erlang is *not* the best tool for *every* part of the system. This
> is also true of data storage.
>
> Most of the issues involving distributed state in a system cross the
> barrier between hardware and software, and are trickier to solve in generic
> ways. That's why hardcore data people still talk in terms of spindles and
> read VS write latency instead of just generally talking about iops (unless
> they are stuck in a cloud service, in which case their game service is
> nearly guaranteed to fail for lack of the remotest clue what their SLA
> means in terms of actually trying to play the game). That's why when you
> buy EMC or NetApp services you have to configure the system to provide the
> sort of performance tradeoffs that fit your system instead of just buying
> them like they were refrigerators. And this is just the hardware side of
> things. Trying to write a generic system to handle distributed state in
> software would require knowing quite a bit about the hardware or else
> choices would be made totally in the blind.
>
> The result is that OTP handles state in terms of "safe state to recover
> and restart", and that's about it. The rest is up to you. Outside of Erlang
> we have database systems that have chosen various tradeoffs already, and we
> can pick from them where necessary. Some systems take the "transactions or
> bust, latency be damned" approach, other databases make the "conflicts will
> happen, or they won't and messages will be received, or they won't, but who
> cares, because PERFORMANCE!!!" approach, etc.
>
> Its not just that nobody can pick the perfect CAP tradeoff for all cases.
> Even more fundamentally, there is no single database system that is a
> perfect solution for all types of data. Sure, you can emulate any type of
> data in a full-service RDBMS, but sometimes its better to have your
> canonical, normalized storage in Postgres but be asking questions to a
> graphing database that is optimized for graph queries, or making text
> search queries to a system optimized for that, or "schemaless" document
> databases that don't really provide guarantees of any sort (so you check
> yourself in other code) but are super fast and easy to understand for some
> specific case (like youtube discussion comments that aren't of critical
> importance to humanity). This difference is what underlies thinking in data
> warehousing and archive retrieval (when people actually think, that is).
>
> If we can't even pick a "generic database" then its impossible to imagine
> that any language runtime could come batteries included with a "generic
> distributed state management framework". That would be a real trick indeed.
> If you happen to figure out how to swing that, give me a call -- I'll help
> you implement it and we'll be super famous, if not fabulously wealthy.
>
> Just to give yourself some brain food -- try describing what the API or
> interface to such a system would even look like. If that's hard to even
> imagine then the problem is usually more fundamental to the problem domain.
>
> -Craig
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20151221/263e6488/attachment.htm>