[erlang-questions] pool:pspawn and spawning at specific nodes

Tue Mar 6 14:32:57 CET 2007

Have you looked at the following family of functions?

erlang:spawn(Node, Fun),
erlang:spawn(Node, Module, Function, ArgumentList)
erlang:spawn_link(Node, Fun),
erlang:spawn_link(Node, Module, Function, ArgumentList)

If you know the node names and want to have control where processes get 
spawned this may be the way to go.

Some words of caution - it may not be a good idea to use the facilities 
of the pool module for this type of an applications.  First if the nodes 
are not started in the same subnet, you need to worry about stability of 
your network connections.  When the network is unstable the disconnected 
pooled nodes started using slave module will be killed, so if you want 
to spawn processes on specific nodes, you'll need to deal with cases 
when those nodes are not available.  Additionally, in this setup if the 
master node dies, you automatically loose all the pooled slave nodes. 
For database applications like the one you described it may not 
necessarily be a good choice.

Perhaps starting N independent nodes, deciding what the connectivity 
matrix needs to be, and then using pg or pg2 modules to accomplish 
scaling might be a better alternative in the case you described?  The 
suggested use of a pool module is in cases when you have some 
computational processes that you want to distribute to independent 
processors and use the master node to load balance requests.

Serge

Austin Seipp wrote:
> Hi,
> 
> I've been working on a little toy erlang program to kill some time. It
> is essentially a simple "Ping, pong" program (as seen in the erlang
> documentation.) I decided to modify it so that it could do things like
> scale itself appropriately (I use pool:start to automatically start
> erlang systmes based on ~/.hosts.erlang) and automatically distribute
> the 'Ping' and the 'Pong' processes amongst the slave nodes brought
> up. Here is my code:
> 
> -module(pingpong).
> -export([start/0,ping/1,pong/0,output_server/0]).
> 
> output_server() ->
>    receive
>        {format,{Str,Fmt}} ->
>            io:fwrite(standard_io,"~s ~s~n",[Str,Fmt]),
>            output_server()
>    end.
> 
> ping(0) ->
>    global:send(pongServer,finished),
>    global:send(outServer,{format,{"ping finished",""}});
> 
> ping(N) ->
>    global:send(pongServer,{ping,self()}),
>    receive
>        pong ->
>            global:send(outServer,{format,{"Ping received pong",""}})
>    end,
>    pingpong:ping(N-1).
> 
> pong() ->
>    receive
>        finished ->
>            global:send(outServer,{format,{"Pong finished",""}});
>        {ping,Ping_PID} ->
>            global:send(outServer,{format,{"Pong received ping",""}}),
>            Ping_PID ! pong,
>            pingpong:pong()
>    end.
> 
> start() ->
>    PongSrv = lists:nth(1,pool:start(ponger)),
>    PingSrv = lists:nth(1,pool:start(pinger)),
>    io:format("nodes registered: ~s and ~s~n",[PongSrv,PingSrv]),
> 
>    %register output server
>    OutSrv_PID = spawn(pingpong,output_server,[]),
>    case global:register_name(outServer,OutSrv_PID) of
>        no ->
>            io:format("could not register output server, exit~n"),
>            pool:stop(),
>            exit(output_srvr_reg_err);
>        yes -> io:format("registered output server~n")
>    end,
> 
>    %register pong server
>    PongSrv_PID = pool:pspawn(pingpong,pong,[]),
>    case global:register_name(pongServer,PongSrv_PID) of
>        no ->
>            io:format("could not register name globally: err~n"),
>            pool:stop(),
>            exit(global_name_reg_err);
>        yes ->
>            io:format("~s: successfully registered global name
> 'pong'~n",[node()])
>    end,
> 
>    %register ping client
>    PingSrv_PID = pool:pspawn(pingpong,ping,[1]),
>    case global:register_name(pingServer,PingSrv_PID) of
>        yes ->
>             io:format("~s: successfully registered global name
> 'ping'~n",[node()]);
>        no ->
>            io:format("couldn't register name globally: err~n"),
>            pool:stop(),
>            exit(global_name_reg_err)
>    end.
> 
> Here is the output:
> 
> [austin@REDACTED erlang]$ erl -sname main
> Erlang (BEAM) emulator version 5.5.3 [source] [async-threads:0] [hipe]
> [kernel-poll:false]
> 
> Eshell V5.5.3  (abort with ^G)
> (main@REDACTED)1> c(pingpong).
> {ok,pingpong}
> (main@REDACTED)2> pingpong:start().
> nodes registered: ponger@REDACTED and pinger@REDACTED
> registered output server
> main@REDACTED: successfully registered global name 'pong'
> Pong received ping
> Ping received pong
> ping finished
> Pong finished
> main@REDACTED: successfully registered global name 'ping'
> ok
> (main@REDACTED)3>
> 
> As you can see, it works fine. However, I wanted to spawn the two
> seperate Ping and Pong enteties on nodes *at my discretion.*
> pool:pspawn simply takes the system out of the pool with the lowest
> load (expected) and starts the new process there. So I sought to see
> if the regular spawn/4 BIF could spawn it on a node of my choice,
> here's the start() and the way I changed it:
> 
> 
> 
> And here's the output now:
> 
> [austin@REDACTED erlang]$ erl -sname main
> Erlang (BEAM) emulator version 5.5.3 [source] [async-threads:0] [hipe]
> [kernel-poll:false]
> 
> Eshell V5.5.3  (abort with ^G)
> (main@REDACTED)1> c(pingpong).
> {ok,pingpong}
> (main@REDACTED)2> pingpong:start().
> nodes registered: ponger@REDACTED and pinger@REDACTED
> registered output server
> main@REDACTED: successfully registered global name 'pong'
> main@REDACTED: successfully registered global name 'ping'
> ok
> (main@REDACTED)3>
> 
> 
> So my question really is: can you start processes on slave nodes you
> bring up via pool:start() *explicitly* (that is, you specify on what
> slave to bring the process up on) rather than letting pspawn decide,
> or must you let pool:pspawn() care of it, or is there another
> solution?
> I wanted to see if I could write a small distributed system that
> automatically scaled across available systems like this one did, and I
> figure I could use pool:start(...) to give meaningful names to the
> Erlang systems I brought up, and bring up a certain part of the whole
> program on that node explicitly (for example, if you had a database
> server written in erlang, you may want to start several nodes, say
> 'filesystem@REDACTED' and another one called 'main@REDACTED' and have the
> filesystem@REDACTED system take care of the processes that store info
> into the filesystem, while main@REDACTED would handle incoming
> connections. pool:pspawn chooses whatever system has the least load;
> name is irrelivant, so I mean, it's not very logical to have the
> filesystem processes executing on the main@REDACTED node with the
> incoming connection code because pool:pspawn decided main@REDACTED had
> a little less load. Please also note how many computers are actually
> involved in this is irrelivant, I just want my processes spawning on
> the slave nodes I specify, and not delegate it to pool:pspawn.)
> 
> 
> Any recommendations? I suppose this setup would do, but it's not
> exactly the ideal scenario here.
> --
> - Austin
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>