[erlang-questions] high availablity the erlang way

Thu May 24 20:21:43 CEST 2007

Comments inline.

-- p

On Thu, 24 May 2007, RCB wrote:

> Paul,
>
> The way that I've done this is fairly simple.   Let's say service B runs its
> own application.    The service that you're providing on service B might be
> made available using a single gen_server, and that gen_server will sit in a
> supervision tree within that application.    What is done from this point is
> to have that particular gen_server's init() function use pg2 to announce
> that it is online and ready to serve requests (of course, do this after any
> initialization that is required...)  It'll make a call like
> pg2:create(service_name), pg2:join(service_name, self()).
>
> The clients will communicate with this service by first getting the eligible
> PIDs from pg2, and then using gen_server:call() to send a request to it:

Awesome.

>
> client:
>
> Spid = pg2:get_closest_pid(service_name),
> Response = gen_server:call(Spid, {this_is_a_request, Param1, Param2, etc}).
>
> This solves the problem with a node taken down.   pg2 module will handle the
> case if the machine is brought down gracefully or not.  It will also handle
> somewhat random distribution to the node(s) which are running that
> gen_server!  You'll need to handle race condition errors with the
> gen_server:call in the clients, but that's relatively simple.

Ok; I assume you mean the race condition wherein the Spid goes down
between the call to pg2:get_closest_pid and gen_server:call; some kind
of retry-if-time-remaining strategy could work here.

> Naturally, if you need to synchronize state across the various service
> nodes, mnesia is an excellent choice.
>
> Let me know if you have additional questions.   Have fun!

I think this will work.  Another problem we have is that we need to do
non-uniform load balacing because the different hardware orders at
different points in time generate boxes with different capacity
(sometimes radically different capacity, such as going from dual
 to quad core); however since pg2 is process id based we might be
able to handle this by spawning more or less processes per node
depending upon the hardware config.

Thanks,

-- p

>
> Rich Beerman
> Cupertino CA
> +1 408-221-2444
>
>
> On 5/24/07, Paul Mineiro <paul-trapexit@REDACTED> wrote:
> >
> > Hey, I'm just learning Erlang, and have a basic question about high
> > availability.
> >
> > I'm wondering what the "Erlang way" for providing a highly available
> > service is.  By this I mean Erlang node providing service A wants to talk
> > to a set of (N > 2, identical) Erlang nodes providing service B.  Any node
> > providing service B might be taken down for maintenance or removed
> > entirely and new nodes providing service B might be introduced for
> > scaling purposes.
> >
> > I've seen parts of the OTP that ensure that 1 copy of an application is
> > running somewhere on a set of nodes, but I found anything that directly
> > address the above.
> >
> > I could have A talk to B via UDP and use a load balancer, but that does
> > not feel very Erlang-y.
> >
> > I do see how combining primitives such as nodeup/nodedown messages,
> > locally registered process names, whereis, etc. I could assemble and
> > maintain a list of providers of service B on each A node, and then
> > load balance between the providers.  Since I don't see anything that
> > does this already I think 1) i'm missing it, or 2) something about
> > Erlang makes this entire line of reasoning moot.
> >
> > Thanks in advance, Erlang high school football rulz, etc.
> >
> > -- p
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://www.erlang.org/mailman/listinfo/erlang-questions
> >
>

Many parts of Iraq are stable now.  But of course what we see on television
is the one bombing a day that discourages everybody.

        -- First Lady Laura Bush