[erlang-questions] Using processes to implement business logic
zxq9
zxq9@REDACTED
Fri Jan 30 13:27:35 CET 2015
On 2015年1月30日 金曜日 11:52:27 Camille Troillard wrote:
> Hi,
>
> I am looking for opinions about using processes to encapsulate the state of
> business entities.
>
> It looks like, in the context of our problem, we should have more advantages
> implementing our domain model using processes rather than simple Erlang
> records. It also appears that processes will act as a nice cache layer in
> front of the persistent storage.
>
> So, what are your experiences?
I've found processes to be extremely flexible with regard to representing
business entity state. There are a few things to consider before you can gain
much from process-based encapsulation of state, though.
A determination must be made about what a useful granularity is for your
business entities. In the case I deal with I have found it useful to start
with a completelty normalized relational data schema as a starting point and
build a heirarchy of structures useful to users up from there. It looks
something like this:
* Elementary record
- As low as it gets; a schema of normalized relations.
* Record
- Practically useful assembly of elementary records and other records.
* Document
- Wraps whatever level of record the user wants to deal with in display,
export and editing rules. This is the essence of a client-side application
(regardless what language or paradigm the client is written in -- I've toyed
with wxErlang for this, but Qt has sort of been a necessity because of ease of
cross platform deployment).
One form of business logic is encapsulated by the relational rules and the
shape of the relational schema. A choice has to be made whether to make
changes cascade at the database level or within application server code. There
is no "right" answer to the question of what level to propagate data updates,
but the shape of the data being declared to follow a strict normalized
relational schema is important if extensibility is a concern (and with
business data it always is).
My choice has been to propagate notification of changes among record processes
according to whatever other processes or external entities are subscribed to
update notifications, but have the database schema cascade changes to foreign
keys on its own (normalized relations primarily consist of primary keys and
foreign keys, though). This choice forces a commitment to having client code
(or record processess) calculate derived values, and using the database rules
only for data integrity enforcement.
Where before I had used materialized views to cache sets of data, I now use
mnesia as a denormalized cache. Mnesia cannot store tables larger than 2GB,
but this has not been a practical limitation within a single
installation/client site (so long as BLOBs are stored as files, and only
references to them are stored in the database rows). If this ever does become
a limitation a caching strategy other than general memoization will become
useful, but I've not hit any walls yet.
Records that are "open" or "active" by a user are instantiated as processes.
These records subscribe to the records they depend on so they receive/push
updates among each other. In this way User A using Client A can update some
data element X, and X will notify its underlying record process, which will
propagate the change across the system downward to the underlying records and
database, and upward to User B on Client B who has a related document open.
This can take some getting used to for users who have grown accustomed to the
typical "refresh the web page to see updates" form of editing or single-user
business applications. (At the moment these living records exist on the
application server, but it could be a delegated task if the clients were also
Erlang nodes (but not a part of the server's cluster), if each table's owning
process managed the subscription system instead of each record. Just haven't
gotten that far yet.)
This sort of data handling requires a lot of consideration about what
"normalization" means, and also care when defining the record schemas. From
records, though, it is easy to write OOP GUI code, or process-based wxErlang
GUI code (which is easier, but harder to deploy on Windows, and impossible on
mobile just now) without your head exploding, and gets you past the "Object-
Relational Mismatch" problem. The tradeoff is all that thought that goes into
both the relational/elementary record schema and the aggregate record schemas,
which turn out to look very different. It requires a considerable amount of
time to get folks who have only ever used an ORM framework on track with doing
Objects-as-processes/records and elementary records as normalized relations --
I have not found a magic shortcut to this yet.
You will not get the schemas right the first time, or the second. Any data
that is "just obvious" at first will probably prove to be non-trivial at its
root. That is:
- tracking people's names in different languages
- properly dealing with scripts instead of just "languages"
- doing addresses + location properly
- making the intuitive leap that families are more like contract organizations
which cover a time span instead of a simple {Husband, Wife, [Kids]} tuple
- business relationship tracking
- event timelines
- non-Western units of measure
- anything to do with calendars
- etc.
Even without all this architecture and just beginning with a relatively dirty,
denormalized schema in mnesia or ETS tables it is possible to see how much
more interesting "live" records defined as processes that are aware of their
interdependencies can be. Combining this with a subscribe/publish model is
very natural in Erlang. But even with a smallish store of business data you
will have to find a way to distinguish between an "active" record and one that
needs to reside latent as a collection of rows in tables. If you instantiate
everything you can quickly find yourself trying to spawn not a few tens of
thousands, but millions of processes (I think this is why you ask your next
question below).
Making each table or type of record a table- or store-owning process and doing
pub/sub at that level may be a golden compromise or might wind up creating
bottlenecks. This is part of my thinking behind making the client-side code
Erlang also, because it seems like a very smooth method of delegation. The
only way to really know is through experimentation. I imagine that there is
probably a golden balance somewhere in the middle, but I haven't had to locate
it yet, and in any case I am still discovering ways to do things.
One thing that is obvious, though, is that my method of writing data
definitions could be refined a bit and interpreted to generate much of the
simpler record Erlang code, the SQL definitions, and probably the ASN.1
definitions also (btw, it turns out things like JSON are not sufficient for
doing business data reliably, and XML is a different sort of nightmare --
boring, old, stodgy ASN.1 is the right tool in this case). Leveraging the
information in the data definitions more completely would make experimentation
a lot faster. As with anything else, its a time/money tradeoff, and one I am
not in a position to make in my favor yet.
> Now another question... given this “actor” based approach, I am having
> difficulties to figure out a proper way of dealing with processes lifetime.
> How would you do this in practice? Manually, or implement simple garbage
> collection, reference counting, ...?
Whenever a record's subscription count hits zero, it retires. This is a form
of reference counting that is a natural outcome of the subscription "open a
document" and "close a document/crash/drop connection" actions. So far this
has been entirely adequate.
I've written this in a bit of a rush, hopefully I explained more than I
confused. There are a million more things to discover about how to make a
system like this do more of the heavy lifting and deliver a better user value.
-Craig
More information about the erlang-questions
mailing list