[erlang-questions] How to think and reason when designing process- and message-centric systems?

Fri Dec 16 05:30:15 CET 2016

There has been a lot of excellent stuff in replies so far.

On 16/12/16 1:04 AM, IRLeif wrote:
> Coming from an object-oriented and data-centric background, I have
> cognitive difficulties when it comes to conceptualizing, thinking about
> and designing systems consisting of modules, processes and key-value
> data stores.
>
> My brain reverts to thinking about classes, objects, inheritance trees,
> encapsulation and SQL-style relational data models.

The conventional way of thinking about OOP is that objects are
ways of encapsulating mutable state.  The mutability of that
state is *why* it needs encapsulation.

What's really been fascinating to watch over the years has been
the way that when OO languages like Java and C# met concurrency
for real, people began to realise that you want as much of your
data as makes sense to be IMMUTABLE.  There are some really great
blog articles about Microsoft's experimental Midori operating
system, and one of the themes is the way they kept on trying to
improve (and exploit!) support for immutable data.

The way to enapsulate data in Erlang is for a process to own it.
(Even in Java or C#, it's not good enough to know which OBJECT
might change a datum, you really want to know which THREADS
might change a datum.  Here again, Midori warped C# into something
that could *prove* that there were no data races, by limiting
access to shared mutable data.)

There's nothing wrong with SQL data bases, except of course the
fact that they are a painfully bad fit for OOP.  They are very
good at what they were designed to do, and if fine-grained
access control to data is important to you -- like if you are
holding data about other people! -- then you *shouldn't* try
to think away from SQL.  However, in the data base area there
are many interesting alternatives, like Graph databases (neo4j
amongst others) and Triple Stores supporting RDF OWL and
SPARQL.  (Scale?  Got it.  Performance?  Got it.  Fine grained
security control?  Not so much.)

Inheritance has its uses.  But even in OOP these days, inheritance
of structure and/or code is deprecated by many opinionated people
in favour of interfaces, and the idea of multiple data structures
with similar interfaces is not alien to functional programming.
(Again, a Midori datum.  The C# analogue of Java's 'final' is
'sealed', and Midori ended up making classes sealed by default.
Oh, and methods were non-virtual by default, even in unsealed
classes.  Apparently they loved that C# feature; they *really*
didn't want any dynamic dispatch / overriding without fairly
major reasons.)

There are a number of things that you have learned that will
carry over quite well.  Starting with use cases is going to help.
Using a testing framework is just as good an idea in Erlang as
it is in Smalltalk (where the XUnit framework originated).

Actually, most of the mental habits you describe sound like
IMPLEMENTATION-ORIENTED thinking.  I'm suggesting taking a
higher level view.

Here's something that happened today.
My daughter is going to be doing tertiary study next year.
She needed to apply for a student loan.
So she needed to get a Tax File Number.
For which you go to the AA.
So she took
  - a paper form
  - her identification documents
  - me to make the payment
What's the work-flow here?  Is the paper form an object
which can do things if you ask it nicely?  Of course not!
The information got copied onto another form.
(The *information* matters, not the *object*.)
Her identification documents were photocopied and
the originals returned.
(The *information* matters, not the *object*.)
The copies were then scanned in again and electronic
copies sent to the capital.
(The *information* matters, not the *object*.)

It should all have been doable on-line, using the
identification numbers on the identification documents
(like driver's licence), BUT it required a known human
being to see them and verify that the pictures matched.

Eventually, some data base is going to hold
  - an SQL encoding of
  - a form typed by a human being
  - reading a scanned copy
  - of a photocopy
  - of an original document issued by the same
    government, for which the data is already
    on file in some other data base(s).

The *information* matters, but the *object* doesn't.

This is a good thing, because in a distributed system,
having multiple sites work on the same object
is, well, it only makes sense for device-controlling
objects that *can't* be copied or massive shared
data objects that *shouldn't* be copied.

You shouldn't be *deciding* which processes should communicate
with each other; that should be *forced* on you by the nature
of the information flow.

The GRASP patterns from OO have some relevance.

Perhaps the biggest problem I've had is unthinking sequential
programming.  I try not to think in terms of *adding*
concurrency to something but of *not taking it away*.
I don't succeed as often as I'd like.

It's a bit like SQL data bases.  You start out with a
normalised design, and then you denormalise *carefully*
when you need the efficiency.

Again, it was a Midori lesson.  LOTS of small processes.
Heavy use of 'async/await': any time they wanted to do
a long-running thing and didn't need the result right
away, fork off a lightweight concurrent activity to do
it and wait only when you really need the result before
you can continue.

As for graphics, people have proposed graphical notations
for Erlang, and Erlang has its roots in a community who
were thoroughly familiar with SDL.  I'm not sure that
there is one notation that is suitable for all people and
all problems.  (UML certainly isn't.)

  I'm afraid this
> could lead to unidiomatic Erlang system architectures and
> implementations, which would be undesirable.
>
> Here are some of the essential complexities I have difficulties grasping:
>
> A) Identifying discrete modules and processes and finding good names for
> them.
> B) Appointing supervisor and worker modules; defining process hierarchies.
> C) Deciding which processes should communicate with each other and how.
> D) Designing a sensible persistent data model with Mnesia or other NoSQL
> data models (e.g. using CouchDB).
> E) Deciding which processes should read and write persistent data records.
> F) Incorporating global modules/"shared facilities" like event handlers,
> loggers, etc.
> G) Visualizing the system architecture, processes and communication
> lines; what kind of graphics to use.
> H) Organizing source code files into separate projects and directory
> structures.
>
> Questions:
>
> 1) How do you unlearn "bad habits" from object-oriented way of thinking?
> 2) How do you think and reason about process-centric systems designs?
> 3) When designing a new system, how do you approach the above activities?
>
> I would appreciate any practical tips, examples, "mind hacks" and
> resources that might help.
>
> Kind regards,
>
> Leif Eric Fredheim
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>