[erlang-questions] State Management Problem

Mon Dec 21 14:56:26 CET 2015

On 2015年12月21日 月曜日 13:11:59 you wrote:
> I think I see your point. I will definitely think about it using all the
> knowledge that I have. Databases do solve the problem but I think that for
> real time applications, going to a database would be slow. The part to
> think about for me, would be to come up with a useful API. Thanks for your
> inputs.

By the way, I've been down the path you are contemplating. I think everyone needs to do this once to really understand how tools fit problems. That is to say, I will probably never trust a system architect on a multi-faceted project unless they've gone through this once themselves.

Now I tend to write an Erlang service as an application service ("Oh no! Sooooo old fashioned!"). Clients perceive that service as the mother of all data operations, but they are high-level operations -- operations where the verb parts are relevant to the high-level business problem and the noun parts are themselves business-problem-level entities.

Lower level operations (permanent storage, specialized indexing, specialized queries, etc.) still happen in database type systems behind the scenes. Very often that's Postgres as a canonical store, but the Erlang application service has a "live" cache of whatever data is active and often a request for data never makes it to the database because it already is live in the application server. Sometimes a graphing database is involved for super-fast queries over otherwise difficult queries -- but you might be surprised at how much you can lean on Postgres itself without taking a performance hit (and when you do have slow queries Postgres has a plethora of great tools available to figure things out and tune). Sometimes a separate copy of the data is in a document, text, image, geometric, or graph db (or separate denormalized tables or even specialized tablespaces) because certain searches are just plain hard to optimize for in the general case. When you really need this sort of extra help, though, it is usually painfully obvious -- so never start out that way, instead get the business-level logic right first.

Typically clients deal with a *very* small fraction of the total data in the storage backend at any given moment, and they keep hitting that tiny fraction over and over. This tends to be true whether its game or business data, actually. (Surprising how similar they are.) Old data tends to not be of any interest, so there is this small percentage of request churn over whatever the going issue of the day happens to be (new contracts, open estimates, current projects, new hr data, some specific financial report, this month's foo, the latest dungeon content, highest-ranking player states, etc.).

There are major advantages to warehousing data in a DBA-approved relational sort of way. It makes warehousing issues *much* easier to deal with (denormalizing a copy of the data for some specific purpose, for example) than trying to take everything out of a gigantic K-V store or super-super performant but split personality "webscale" db and normalizing it after the fact (protip: you will have all sorts of random loose ends that are insanely hard to figure out, no matter what sort of "semantic tagging" system you *think* you've invented -- sometimes this is so bad that whatever analytic results your client is trying to discover turn out totally wrong).

So it still comes down to different tools for different uses. That active, high-interest data is a *perfect* fit for ETS tables and/or process state (depending on the case -- both are crazy fast, but you don't want every item in a game with 500M active items to be a process). But this data should always be rebuildable from your backend database state that comes from a stodgy old, DBA-endorsed, safety-first sort of database where data security is well understood and large community of experts exists. When you start your application service up, the first requests are what populate your ETS tables and cause processes to spawn. When resources are unneeded those caches shrink (processes exit, tables shrink).

Having clients talk to the application service instead of an ORM or the data backends directly lets you forget that there is a MASSIVE PROBLEM with ORM frameworks (for anything more interesting than, say, a blog website framework), because you will still have to write a translation between your application representation and the database representation. This is probably the most tedious part of writing the whole system -- but if you don't do this you will wind up re-inventing every bit of the transactional, relational, navigation-capable, document and object database paradigms all mishmashed together into an incoherent, buggy, half-baked API without realizing it (until its too late to change your mind, that is... *then* you'll certainly realize it).

DON'T REINVENT THE DATABASE.

But you'll do this once, no matter what I say. Anyone who architects and then implements two huge systems does this once (either the first or the second time). And then realize that an application-service-as-a-datasource can be a very fast, nice thing, and that databases are magical tools that you're really, really glad someone else went to the trouble to write.

Also -- none of this *solves* the distributed state problem. But there is hope! As you write systems you'll start feeling out places where it is OK to partition the problem, where temporary inconsistency in the cached data is OK (and where not in the backend datastore), how much write lag is OK between the application service and the backend, what sort of queries are pull-your-hair-out hard or slow or slow-and-hard without a specialized database (hint: text search, image search and graph queries), and other such issues. Also -- what data is just OK to disappear POOF! when something goes wrong (ephemeral chat messages, for example).

Its a lot to think through, and whatever tradeoffs you decide fit in specific spots are going to depend entirely on the problem you are trying to solve in *that* part of the system and the user-facing context. This is true in any language and any environment. Anyone who says differently has only ever dealt with trivial data or is a big fat liar trying to get some magical consultancy cash from you.

None of this is to intimidate or discourage you. Have fun. (Seriously!) These data conundrums are some of the most delicate and interesting problems you'll ever encounter -- and unlike algorithmic solutions to procedural problems that translate readily to arithmetic, data problems are *never* fully "solved". (Which also means this is a rabbit hole you can lose your entire career/mind in... forever!)

-Craig