[erlang-questions] Design question: addressing multiple e.g. gen_server instances?

Mon Nov 5 23:57:34 CET 2007

Hi,

I have a design question that puzzles me since quite some time and so far I did not find any satisfying example or other hint to answer my question. Maybe someone of you has a good idea. I will also explain how I approached this so far. Your input is very welcome!
And sorry if this mail becomes a bit long!

So my question:
---------------
How do you address multiple e.g. gen_server instances connected in a 1:N relationship in a consistent and intelligent way if you want to follow the OTP design principles?

Let me explain more detailed what I mean: Please consider a (simplified) system which e.g. reads a log file, parses them and delivers the result to multiple other components which do whatever necessary. You can design this consisting of several gen_whatever components:
* a monitor: reads log files or whatever input source. -> parser
* a parser: parses the input and generates events. -> event manager
* an event manager: distributes the events to the event handlers -> event handlers
* event handlers: doing whatever necessary

That means the monitor needs to address the parser during the gen_whatever:call or cast, etc. This works fine according to the OTP design principles as long as:

a) There is a 1:1 relationship between each following component.

b) We are in two dimensional space: There is one instance of all modules. I would call it 3D if you have completely independent input sources and components behind that in the call queue where you do not want them to interfere with each other. Like when you want to process log files of completely different web servers: If the parser for webserver A's log file crashes you do not want that B's parser gets disturbed. So you start the whole set of components once for each input source. And consider you do not simply want to start multiple Erlang instances but put another layer of monitor-parser-etc. above the old one.

As long as a+b ar given: Fine! You hardcode module name and registered process name into the predecessor component.

But what if there is more than one parser (1:N)? And you do not want to hardcode more than necessary because you would like to regard your components as plugins: Several generic monitors (generic file reader, syslog reader, etc) which just plug into your system? Or one event handler plugin needs to get in contact with another one? And this in 3D process space.

My approaches so far:
---------------------
I started with giving each layer in 3D a unique name - I call it a domain - and registering all processes with names: class@REDACTED@domain. Class is the type of process (monitor, parser, etc.). This name is unique as long as we do not have two times the same module in one domain.

Then I let my supervisors put necessary call target information (module name, registered name) into the startargs of all components.
But this approach on the one side does not scale very well and on the other side do you need to know exactly what a component needs to talk to before you start it. Not very good.

OK, one step back: I wrote an additional gen_server process: a process registry. Each component at startup only gets told what name it is supposed to register under, which domain it is running and how to reach process registry. Then all processes register themselves during startup with these informations together with class name and module name. Furthermore each component tells process registry what API it has to offer.
Now whenever a component needs to get in contact with another one it contacts process registry first, asking for directions.
I pretty soon realized that I just had doubled the messages which get sent around and implemented a caching mechanism to cache process registry query results.
Still I try to find out where I should store these results best: In the process' state or in an ets table whereas each process owns its own table.
And so far I did not deal with processes residing on different nodes. Guess this will get a bit nasty.

The process registry approach also has a couple of disadvantages:
* Complexity especially during debugging is an order or magnitude higher now.
* Slightly less performant.
* No guarantee that a target function can be found. Although this could be conquered by first querying all components what functions they will need to access.

But my main problem is that I am simply to unexperienced with Erlang to decide whether I am on the right track or there is a much better approach. And I guess I am not the first one dealing with such a design question.

Your feedback is very welcome!

Regards,
Eric
-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger