The Erlang way - dynamic upgrade of a server and UBF extensions

Wed Apr 30 16:38:17 CEST 2003

Vlad wrote:
>separate the
>normal behaviour from the exceptional cases and let some kind of supervisor
>process notify clients of a process that the Pid is no longer valid. In this
>case the supervisor would be more of a proxy. Likewise, a server should not
>try to do it's own load balancing, but let a specialized part of the system
>handle this case too.

This is worth thinking about, but there is a key difference from the
supervisor.  A supervisor watches other processes and tries to
right them when they fall down.  The supervisor is behind the curtains
and no other process knows about it.

In the client / server approach, the supervisor would notice something
is wrong and somehow try to influence the conversation.  How does
the supervisor initiate a conversation with a client that likely doesn't
accept connections?  Similarly, the load balancing is an issue that
involves pushing the single point of failure up one level of processes
but not eliminating it.

I like the idea of using a "packet router" to receive the protocol,
of having multiple of these and allowing the client to try others on
failure and having available some sort of directory assistance
especially if the client discovers that none of the new servers
have implemented the latest protocol properly.

I guess one way to do the supervisor approach is to follow Joe's
analogy of an office manager that "fires" the failing server and
"hires" a stand in.  You need a packet router to do that seamlessly
though:

1) Client contacts packet router, handshakes, makes request
2) Packet router gives to Server A for a response
3) Client makes 2nd request
4) Packet router gives it to Server A, A fails
5) Supervisor notices, updates packet router with message
       to put Server A in a non-preferred list
6) Packet router re-routes request to Server B
7) Client receives response

It may mean that packet router has to return the response
to the client in all cases or you may lose your grip on the
socket in the failed process.

Anyone tried to keep a copy of the socket in two processes,
one of which owns it, fails and the other of which retakes
controlling_process and then hands it off to a third process
to finish the task?

jay