[erlang-questions] Using processes to implement business logic

Fri Jan 30 14:11:14 CET 2015

Hi Craig,

Thank you for your answer, it is of incredible value.
I foresee more questions, related to distribution... when I get there.

Cam

On 30 Jan 2015, at 13:27, zxq9 <zxq9@REDACTED> wrote:

> On 2015年1月30日 金曜日 11:52:27 Camille Troillard wrote:
>> Hi,
>> 
>> I am looking for opinions about using processes to encapsulate the state of
>> business entities.
>> 
>> It looks like, in the context of our problem, we should have more advantages
>> implementing our domain model using processes rather than simple Erlang
>> records. It also appears that processes will act as a nice cache layer in
>> front of the persistent storage.
>> 
>> So, what are your experiences?
> 
> I've found processes to be extremely flexible with regard to representing 
> business entity state. There are a few things to consider before you can gain 
> much from process-based encapsulation of state, though.
> 
> A determination must be made about what a useful granularity is for your 
> business entities. In the case I deal with I have found it useful to start 
> with a completelty normalized relational data schema as a starting point and 
> build a heirarchy of structures useful to users up from there. It looks 
> something like this:
> 
> * Elementary record
> - As low as it gets; a schema of normalized relations.
> 
> * Record
> - Practically useful assembly of elementary records and other records.
> 
> * Document
> - Wraps whatever level of record the user wants to deal with in display, 
> export and editing rules. This is the essence of a client-side application 
> (regardless what language or paradigm the client is written in -- I've toyed 
> with wxErlang for this, but Qt has sort of been a necessity because of ease of 
> cross platform deployment).
> 
> One form of business logic is encapsulated by the relational rules and the 
> shape of the relational schema. A choice has to be made whether to make 
> changes cascade at the database level or within application server code. There 
> is no "right" answer to the question of what level to propagate data updates, 
> but the shape of the data being declared to follow a strict normalized 
> relational schema is important if extensibility is a concern (and with 
> business data it always is).
> 
> My choice has been to propagate notification of changes among record processes 
> according to whatever other processes or external entities are subscribed to 
> update notifications, but have the database schema cascade changes to foreign 
> keys on its own (normalized relations primarily consist of primary keys and 
> foreign keys, though). This choice forces a commitment to having client code 
> (or record processess) calculate derived values, and using the database rules 
> only for data integrity enforcement.
> 
> Where before I had used materialized views to cache sets of data, I now use 
> mnesia as a denormalized cache. Mnesia cannot store tables larger than 2GB, 
> but this has not been a practical limitation within a single 
> installation/client site (so long as BLOBs are stored as files, and only 
> references to them are stored in the database rows). If this ever does become 
> a limitation a caching strategy other than general memoization will become 
> useful, but I've not hit any walls yet.
> 
> Records that are "open" or "active" by a user are instantiated as processes. 
> These records subscribe to the records they depend on so they receive/push 
> updates among each other. In this way User A using Client A can update some 
> data element X, and X will notify its underlying record process, which will 
> propagate the change across the system downward to the underlying records and 
> database, and upward to User B on Client B who has a related document open. 
> This can take some getting used to for users who have grown accustomed to the 
> typical "refresh the web page to see updates" form of editing or single-user 
> business applications. (At the moment these living records exist on the 
> application server, but it could be a delegated task if the clients were also 
> Erlang nodes (but not a part of the server's cluster), if each table's owning 
> process managed the subscription system instead of each record. Just haven't 
> gotten that far yet.)
> 
> This sort of data handling requires a lot of consideration about what 
> "normalization" means, and also care when defining the record schemas. From 
> records, though, it is easy to write OOP GUI code, or process-based wxErlang 
> GUI code (which is easier, but harder to deploy on Windows, and impossible on 
> mobile just now) without your head exploding, and gets you past the "Object-
> Relational Mismatch" problem. The tradeoff is all that thought that goes into 
> both the relational/elementary record schema and the aggregate record schemas, 
> which turn out to look very different. It requires a considerable amount of 
> time to get folks who have only ever used an ORM framework on track with doing 
> Objects-as-processes/records and elementary records as normalized relations -- 
> I have not found a magic shortcut to this yet.
> 
> You will not get the schemas right the first time, or the second. Any data 
> that is "just obvious" at first will probably prove to be non-trivial at its 
> root. That is:
> - tracking people's names in different languages
> - properly dealing with scripts instead of just "languages"
> - doing addresses + location properly
> - making the intuitive leap that families are more like contract organizations 
> which cover a time span instead of a simple {Husband, Wife, [Kids]} tuple
> - business relationship tracking
> - event timelines
> - non-Western units of measure
> - anything to do with calendars
> - etc.
> 
> Even without all this architecture and just beginning with a relatively dirty, 
> denormalized schema in mnesia or ETS tables it is possible to see how much 
> more interesting "live" records defined as processes that are aware of their 
> interdependencies can be. Combining this with a subscribe/publish model is 
> very natural in Erlang. But even with a smallish store of business data you 
> will have to find a way to distinguish between an "active" record and one that 
> needs to reside latent as a collection of rows in tables. If you instantiate 
> everything you can quickly find yourself trying to spawn not a few tens of 
> thousands, but millions of processes (I think this is why you ask your next 
> question below).
> 
> Making each table or type of record a table- or store-owning process and doing 
> pub/sub at that level may be a golden compromise or might wind up creating 
> bottlenecks. This is part of my thinking behind making the client-side code 
> Erlang also, because it seems like a very smooth method of delegation. The 
> only way to really know is through experimentation. I imagine that there is 
> probably a golden balance somewhere in the middle, but I haven't had to locate 
> it yet, and in any case I am still discovering ways to do things.
> 
> One thing that is obvious, though, is that my method of writing data 
> definitions could be refined a bit and interpreted to generate much of the 
> simpler record Erlang code, the SQL definitions, and probably the ASN.1 
> definitions also (btw, it turns out things like JSON are not sufficient for 
> doing business data reliably, and XML is a different sort of nightmare -- 
> boring, old, stodgy ASN.1 is the right tool in this case). Leveraging the 
> information in the data definitions more completely would make experimentation 
> a lot faster. As with anything else, its a time/money tradeoff, and one I am 
> not in a position to make in my favor yet.
> 
>> Now another question... given this “actor” based approach, I am having
>> difficulties to figure out a proper way of dealing with processes lifetime.
>> How would you do this in practice? Manually, or implement simple garbage
>> collection, reference counting, ...?
> 
> Whenever a record's subscription count hits zero, it retires. This is a form 
> of reference counting that is a natural outcome of the subscription "open a 
> document" and "close a document/crash/drop connection" actions. So far this 
> has been entirely adequate.
> 
> I've written this in a bit of a rush, hopefully I explained more than I 
> confused. There are a million more things to discover about how to make a 
> system like this do more of the heavy lifting and deliver a better user value.
> 
> -Craig