[erlang-questions] How to think and reason when designing process- and message-centric systems?

Thu Dec 15 22:22:14 CET 2016

Hi,

There's been a lot of good advice in this thread so far. I would like to
throw in my two cents regarding designing and thinking in process
structures.

Erlang is known as a highly concurrent language, every tutorial talks a lot
about processes and so on; so it's no surprise working with processes
sounds very exciting (and maybe even scary sometimes) for newcomers. But I
believe in most cases you should ignore them as long as you can.

A typical server application would serve incoming requests. Signing up a
new user for example. Your system will be concurrent by simply allowing
each request to be served in parallel. So there will be one process per
request. This is something you would get typically out of the box from your
Erlang web server - no need to think much about processes so far.

Your job would be to code how to handle a request. How to sign up a new
user for example. This is often just simple sequential code: add the user
to the DB, send a welcome email, whatever. You don't need to throw in
processes just because you're working in Erlang. It's not like programming
a GPU, where you have to think hard how to allow all those tiny processors
crunch through your initially very sequential looking calculation. If your
algorithm is best expressed as sequential code, just write sequential code.
Your server will be fast not because it could use all CPU cores to serve
that single sign up request. It will be fast because it will be able to
serve different sign up requests on all those cores.

As you go on with your sequential code, you will sometimes encounter
problems where the need for new processes arise naturally. Then you can add
these processes, and it will all feel very obvious and easy. Some typical
situations that call for processes:

   - You have some stateful component that needs atomic updates. A process
   can hold the state and its message queue will serialise the update requests.
   - You have to perform some task as you serve the request, but it doesn't
   have to complete before sending a response to the user. Just delegate this
   asynchronous task to a different process. Typical example: logging.
   - You need to do 2 or more tasks that don't depend on each other and can
   take a long time. Just spawn a process for each task and let them run in
   parallel.
   - You have a resource like an ETS table or port that has to be around
   for a long time. You will need a long living process to own this resource.
   - You need to do some heavy (CPU or memory wise) computation. Like
   resizing the avatar the new user has uploaded. You are afraid that if too
   many concurrent sign ups would happen at the same time you would overload
   the system. This is a point where you concurrent processes need to
   synchronise with each other (maybe via an other process or just an ETS
   table) to limit the concurrency in your system. (Yes, this is kind of the
   opposite of the above points...)

And there are of course advanced situations when you can (or even need to)
code process-aware. There are nice tricks like offloading work from
processes with a possibly long message queue to a different process with
guaranteed no messages, spawning new processes in a library to protect the
global state of the process calling into the library and so on. But in the
beginning just concentrate on the simple cases.

Once you start using more and more processes, you will have to answer the
question of what to do if one of them dies? Shall asynchronous tasks still
be carried on if the initial request crashed? You will quickly learn a lot
about links and supervisors when you get to this point.

So my advise is to start writing the sequential part of the code, and see
what other processes emerge naturally. If you want to first design the
process structure and supervisor tree of your application, than you risk
implementing a lot of functionality that could be plain sequential library
code as gen_servers. Which would only introduce unnecessary bottlenecks
into your system.

Cheers,
Daniel

On Thu, 15 Dec 2016 at 17:56 Sergey Safarov <s.safarov@REDACTED> wrote:

> As guide for first step in erlang I use
> http://learnyousomeerlang.com/contents
>
> Think it will help you.
>
> чт, 15 дек. 2016, 18:33 Loïc Hoguin <essen@REDACTED>:
>
> Welcome,
>
> On 12/15/2016 01:04 PM, IRLeif wrote:
> > Coming from an object-oriented and data-centric background, I have
> > cognitive difficulties when it comes to conceptualizing, thinking about
> > and designing systems consisting of modules, processes and key-value
> > data stores.
> >
> > My brain reverts to thinking about classes, objects, inheritance trees,
> > encapsulation and SQL-style relational data models. I'm afraid this
> > could lead to unidiomatic Erlang system architectures and
> > implementations, which would be undesirable.
>
> That's what brains do, unfortunately.
>
> Let me stop you there by correcting one small thing: SQL-style
> relational data models can still apply depending on the system you are
> building. PostgreSQL usage is not uncommon in the Erlang world, at least
> for small to medium systems.
>
> Classes, objects, and so on, on the other hand, do not exist, and
> thinking in terms of objects will have a negative impact on the design
> of your system.
>
> How do you solve this? Well it depends on your experience outside of OO.
> Here are a few early tips, apply the relevant ones:
>
> - Have you used a non-OO imperative language? C for example. In these
> languages, you only have functions. In Erlang, you only have functions.
> Erlang is closer to C than OO, so switch your brain to C-style
> programming instead of OO.
>
> If necessary, spend a few hours writing a short C program (or another
> language as long as you don't use/write OO code) and then switch to
> Erlang. This will set your mind on the right path.
>
> - Have you ever heard of the best practice that dictates that variables
> should not be reused? Erlang basically enforces that.
>
> When writing that short program, make sure you do not reuse variables.
>
> - Have you ever written code that recurses over directories to for
> example file files matching a pattern? You can't do that with only
> loops, regardless of the language.
>
> Have the short program use recursion instead of loops.
>
> After doing this your mind will be very close to how it needs to be to
> write sequential Erlang programs. The rest is minor syntax differences
> like commas and semicolons or uppercase first letter for variable names.
>
> 95% of the Erlang code you write is short sequential programs. If you
> have experience with microservices, then think of Erlang processes as
> microservices. If you have experience with client/server development,
> then think of them all as servers that communicate to each other in a
> mostly client/server fashion. The only difference is that they
> communicate using Erlang messages.
>
> It definitely helps to reimplement something like a gen_server or
> gen_statem as you will better grasp links, monitors and timeouts. These
> are useful to have solid systems; although you can do mostly without and
> still get results.
>
> > Here are some of the essential complexities I have difficulties grasping:
> >
> > A) Identifying discrete modules and processes and finding good names for
> > them.
>
> There's basically two kinds of modules in Erlang: those implementing a
> process, and those containing useful functions. For the latter it should
> be obvious: group all the related functions together. For the former,
> you are almost required to put them in a module as you implement
> behaviours, so no big trouble there.
>
> As for naming, well, good luck with that.
>
> I generally follow this pattern: <ns>_<name>[_<suffix>] where <ns> is a
> namespace I put in all modules of my application (often the name of the
> application), <name> is the descriptive name (for example 'router' or
> 'http') and <suffix> is the type of the module, for example 'server',
> 'sup' for supervisor, 'h' for Cowboy handler, 't' for a module
> containing functions for converting from/to and manipulating a type, and
> so on.
>
> > D) Designing a sensible persistent data model with Mnesia or other NoSQL
> > data models (e.g. using CouchDB).
>
> Depending on what you are doing you might not need those. And if you do,
> I would advise not jumping into it early. Start with what you know.
> Erlang is a big enough change as it is.
>
> > E) Deciding which processes should read and write persistent data
> records.
>
> Start simple, measure, then decide what to do about it. Pools of
> connections are not uncommon, and already written for you in most cases.
>
> > F) Incorporating global modules/"shared facilities" like event handlers,
> > loggers, etc.
>
> Don't know what you mean with event handlers exactly, but as far as
> logging goes, use either error_logger or lager depending on your needs.
> Again not much to think about there.
>
> > H) Organizing source code files into separate projects and directory
> > structures.
>
> A few simple rules:
>
> - A module covers one small topic
> - A process does one thing
> - An application covers one topic and does one thing that involves
> multiple modules and processes
>
> > B) Appointing supervisor and worker modules; defining process
> hierarchies.
> > C) Deciding which processes should communicate with each other and how.
> > G) Visualizing the system architecture, processes and communication
> > lines; what kind of graphics to use.
>
> I'll leave that one for others.
>
> > Questions:
> >
> > 1) How do you unlearn "bad habits" from object-oriented way of thinking?
>
> The only way to unlearn something is to not use it for extended periods
> of time. I'm afraid that's not much help. Instead, try manipulating your
> brain into finding the right state of mind when you work on Erlang
> projects.
>
> It's the same problem you have when you play tennis and then switch to
> ping pong. The only difference is that you don't play them in the same
> environment or with the same tools so it's easier to switch from one to
> the other.
>
> Actually changing environment could help you into switching your OO
> brain off: change desk, change editor, and so on.
>
> > 2) How do you think and reason about process-centric systems designs?
> > 3) When designing a new system, how do you approach the above activities?
>
> Start small and naive, measure, then take decisions as to how the
> processes should be organized. Erlang is very easy to refactor compared
> to OO, so take advantage of that. Go for a working prototype, then
> improve upon it. How long you take to go from prototype to production
> will mostly depend on your experience.
>
> I've been interrupted about a thousand times when writing this, so
> hopefully I make some sort of sense. :-)
>
> Cheers,
>
> --
> Loïc Hoguin
> https://ninenines.eu
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161215/1e438f4f/attachment.htm>