architectural patterns for unifying state from multiple upstream processes

Thu Apr 30 23:29:45 CEST 2020

thanx for the very insightful response!

I particularly like “make the message itself the process”, i.e. spawn a graph for each message/task.

From: Jesper Louis Andersen [mailto:jesper.louis.andersen@REDACTED] 
Sent: Wednesday, April 29, 2020 2:44 AM
To: jdmeta@REDACTED
Cc: Erlang (E-mail)
Subject: Re: architectural patterns for unifying state from multiple upstream processes

On Wed, Apr 29, 2020 at 6:47 AM <jdmeta@REDACTED> wrote:

below is a trivial example of this scenario where a top-level service is decomposed into xService and yService which run concurrently.

however, while yService performs some sub-task concurrently, it ultimately needs the output of xService to finish this sub-task (in this contrived case, to know which downstream service to send a message to where said message is some union of the work of xService and yService).

To me, this is the key dependency. Since yService needs the output of xService, There is a dependency x -> y. Hence, it is not clear to me why it has to be two services in the first place. I'd probably just spawn a process per work-unit at the top level and then let the code run sequentially inside those processes, performing the job of x, then y, then z. Assuming there are a lot of work-units coming through the system, this is eventually going to saturate your processing cores. 

In general, there are two patterns I tend to see in concurrent systems. One is where you have relatively few processes, and they then forward messages along "channels" forming a processing network that's relatively static. It is common if creating processes tend to be expensive (like the need for an OS-thread or something such, or if a new thread has considerable memory overhead). The other, you make the message itself into a process and spawn a process per message. If creating a new process is cheap, this is more common. Erlang likes the latter approach[0]. You don't have a single graph in which multiple messages are flowing. You have a million graphs in each of which a single message is flowing. And that graph is usually represented by tail-calling functions. Of course, you have to handle access to limited resources in such a system. But this can often be done by factoring through a manager-process of some kind handing out tokens whenever a resource is available.

The other reason a process-per-message approach could be alluring is that it often limits message communication. That tends to make for a simpler program where fewer things can fail along the way.

[0] Any language with cheap process creation tend to be viable. Go could also be used for this approach, even though it has access to a channel primitive.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200430/e81f05a8/attachment.htm>