[erlang-questions] Discussion and proposal regarding rpc scalability

Thu Feb 11 22:57:34 CET 2016

Here is an alternative suggestion based on my experience of implementing
our own rpc mechanism precisely because of the problems you mention. I
alluded briefly to this during my CodeMesh presentation last year as well (
http://www.codemesh.io/static/upload/media/14478899074266codemesh2015.pdf)
- slides 30-33

The idea is to remove the use of rex completely, and extend the RPC
mechanism with these features.

* Named RPC endpoints which allow connections to multiple nodes
So supposed you have three erlang nodes which offer the same service, let's
call them db@REDACTED, db@REDACTED, db@REDACTED, it would be nice to have a
single RPC endpoint called DB, which sets up and maintains connections to
all 3 nodes, and to be able to invoke it as. This would also give you your
application specific RPC connections.

rpc:call('DB', some_module, some_function, [args,...])

* Load balancing between these connections using different load balancing
strategies

* Seamless failover so that clients aren't aware that one of the nodes is
down (if you really wanted to know, then you could use
erlang:monitor_node/2,3)

* Ability to use multiple TCP connections behind the same endpoint (and
this could be easily extended to use TLS instead)

I had implemented all this at my previous place of work and we had a lot of
success with it. This was done as a separate application obviously, but it
would be good to finally get rid of rex and its associated problems.

cheers,
Chandru

On 11 February 2016 at 21:04, José Valim <jose.valim@REDACTED>
wrote:

> Hello everyone,
>
> I was reading the publication "Investigating the Scalability Limits of
> Distributed Erlang
> <http://www.dcs.gla.ac.uk/~amirg/publications/DE-Bench.pdf>" and one of
> the conclusions is:
>
> *> We observed that distributed Erlang scales linearly up to 150 nodes
> when no global command is made. Our results reveal that the latency of rpc
> calls rises as cluster size grows. This shows that spawn scales much better
> than rpc and using spawn instead of rpc in the sake of scalability is
> advised. *
>
> The reason why is highlighted in a previous section:
>
> *> To find out why rpc’s latency increases as the cluster size grows, we
> need to know more about rpc. (...) There is a generic server process (gen
> server) on each Erlang node which is named rex. This process is responsible
> for receiving and handling all rpc requests that come to an Erlang node.
> After handling the request, generated results will be returned to the
> source node. In addition to user applications, rpc is also used by many
> built-in OTP modules, and so it can be overloaded as a shared service.*
>
> In other words, the more applications we have relying on rpc, the more
> likely rpc will become a bottleneck and increase latency. I believe we have
> three options here:
>
> 1. Promote spawn over rpc, as the paper conclusion did (i.e. mention spawn
> in the rpc docs and so on)
> 2. Leave things as is
> 3. Allow "more scalable" usage of rpc by supporting application specific
> rpc instances
>
> In particular, my proposal for 3 is to allow developers to spawn their own
> rpc processes. In other words, we can expose:
>
> rpc:start_link(my_app_rpc) %% start your own rpc
>
> rpc:call({my_app_rpc, nodename}, foo, bar, [1, 2, 3]) %% invoke your own
> rpc at the given node
>
>
> This is a very simple solution that moves the bottleneck away from rpc's
> rex process since developers can place their own rpc processes in their
> application's tree. The code changes required to support this feature are
> also minimal and are almost all at the API level, i.e. support a tuple were
> today a node is expected or allow the name as argument, mimicking the same
> API provided by gen_server that rpc relies on. We won't change
> implementation details. Finally, I believe it will provide a more
> predictable usage of rpc.
>
> Feedback is appreciated!
>
> *José Valim*
> www.plataformatec.com.br
> Skype: jv.ptec
> Founder and Director of R&D
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160211/8aec786b/attachment.htm>