[erlang-questions] Using processes to implement business logic

Fri Jan 30 13:27:35 CET 2015

On 2015年1月30日 金曜日 11:52:27 Camille Troillard wrote:
> Hi,
> 
> I am looking for opinions about using processes to encapsulate the state of
> business entities.
> 
> It looks like, in the context of our problem, we should have more advantages
> implementing our domain model using processes rather than simple Erlang
> records. It also appears that processes will act as a nice cache layer in
> front of the persistent storage.
> 
> So, what are your experiences?

I've found processes to be extremely flexible with regard to representing 
business entity state. There are a few things to consider before you can gain 
much from process-based encapsulation of state, though.

A determination must be made about what a useful granularity is for your 
business entities. In the case I deal with I have found it useful to start 
with a completelty normalized relational data schema as a starting point and 
build a heirarchy of structures useful to users up from there. It looks 
something like this:

* Elementary record
- As low as it gets; a schema of normalized relations.

* Record
- Practically useful assembly of elementary records and other records.

* Document
- Wraps whatever level of record the user wants to deal with in display, 
export and editing rules. This is the essence of a client-side application 
(regardless what language or paradigm the client is written in -- I've toyed 
with wxErlang for this, but Qt has sort of been a necessity because of ease of 
cross platform deployment).

One form of business logic is encapsulated by the relational rules and the 
shape of the relational schema. A choice has to be made whether to make 
changes cascade at the database level or within application server code. There 
is no "right" answer to the question of what level to propagate data updates, 
but the shape of the data being declared to follow a strict normalized 
relational schema is important if extensibility is a concern (and with 
business data it always is).

My choice has been to propagate notification of changes among record processes 
according to whatever other processes or external entities are subscribed to 
update notifications, but have the database schema cascade changes to foreign 
keys on its own (normalized relations primarily consist of primary keys and 
foreign keys, though). This choice forces a commitment to having client code 
(or record processess) calculate derived values, and using the database rules 
only for data integrity enforcement.

Where before I had used materialized views to cache sets of data, I now use 
mnesia as a denormalized cache. Mnesia cannot store tables larger than 2GB, 
but this has not been a practical limitation within a single 
installation/client site (so long as BLOBs are stored as files, and only 
references to them are stored in the database rows). If this ever does become 
a limitation a caching strategy other than general memoization will become 
useful, but I've not hit any walls yet.

Records that are "open" or "active" by a user are instantiated as processes. 
These records subscribe to the records they depend on so they receive/push 
updates among each other. In this way User A using Client A can update some 
data element X, and X will notify its underlying record process, which will 
propagate the change across the system downward to the underlying records and 
database, and upward to User B on Client B who has a related document open. 
This can take some getting used to for users who have grown accustomed to the 
typical "refresh the web page to see updates" form of editing or single-user 
business applications. (At the moment these living records exist on the 
application server, but it could be a delegated task if the clients were also 
Erlang nodes (but not a part of the server's cluster), if each table's owning 
process managed the subscription system instead of each record. Just haven't 
gotten that far yet.)

This sort of data handling requires a lot of consideration about what 
"normalization" means, and also care when defining the record schemas. From 
records, though, it is easy to write OOP GUI code, or process-based wxErlang 
GUI code (which is easier, but harder to deploy on Windows, and impossible on 
mobile just now) without your head exploding, and gets you past the "Object-
Relational Mismatch" problem. The tradeoff is all that thought that goes into 
both the relational/elementary record schema and the aggregate record schemas, 
which turn out to look very different. It requires a considerable amount of 
time to get folks who have only ever used an ORM framework on track with doing 
Objects-as-processes/records and elementary records as normalized relations -- 
I have not found a magic shortcut to this yet.

You will not get the schemas right the first time, or the second. Any data 
that is "just obvious" at first will probably prove to be non-trivial at its 
root. That is:
- tracking people's names in different languages
- properly dealing with scripts instead of just "languages"
- doing addresses + location properly
- making the intuitive leap that families are more like contract organizations 
which cover a time span instead of a simple {Husband, Wife, [Kids]} tuple
- business relationship tracking
- event timelines
- non-Western units of measure
- anything to do with calendars
- etc.

Even without all this architecture and just beginning with a relatively dirty, 
denormalized schema in mnesia or ETS tables it is possible to see how much 
more interesting "live" records defined as processes that are aware of their 
interdependencies can be. Combining this with a subscribe/publish model is 
very natural in Erlang. But even with a smallish store of business data you 
will have to find a way to distinguish between an "active" record and one that 
needs to reside latent as a collection of rows in tables. If you instantiate 
everything you can quickly find yourself trying to spawn not a few tens of 
thousands, but millions of processes (I think this is why you ask your next 
question below).

Making each table or type of record a table- or store-owning process and doing 
pub/sub at that level may be a golden compromise or might wind up creating 
bottlenecks. This is part of my thinking behind making the client-side code 
Erlang also, because it seems like a very smooth method of delegation. The 
only way to really know is through experimentation. I imagine that there is 
probably a golden balance somewhere in the middle, but I haven't had to locate 
it yet, and in any case I am still discovering ways to do things.

One thing that is obvious, though, is that my method of writing data 
definitions could be refined a bit and interpreted to generate much of the 
simpler record Erlang code, the SQL definitions, and probably the ASN.1 
definitions also (btw, it turns out things like JSON are not sufficient for 
doing business data reliably, and XML is a different sort of nightmare -- 
boring, old, stodgy ASN.1 is the right tool in this case). Leveraging the 
information in the data definitions more completely would make experimentation 
a lot faster. As with anything else, its a time/money tradeoff, and one I am 
not in a position to make in my favor yet.

> Now another question... given this “actor” based approach, I am having
> difficulties to figure out a proper way of dealing with processes lifetime.
> How would you do this in practice? Manually, or implement simple garbage
> collection, reference counting, ...?

Whenever a record's subscription count hits zero, it retires. This is a form 
of reference counting that is a natural outcome of the subscription "open a 
document" and "close a document/crash/drop connection" actions. So far this 
has been entirely adequate.

I've written this in a bit of a rush, hopefully I explained more than I 
confused. There are a million more things to discover about how to make a 
system like this do more of the heavy lifting and deliver a better user value.

-Craig