pipe messages sent between different nodes to boost speed
Roberto Ostinelli
roberto@REDACTED
Tue Jun 9 14:13:46 CEST 2009
dear all,
two months ago i've started an experiment to increase speed of
messages sent between different erlang nodes. i seem to have found a
way to considerably increase this speed up to 3 times the native
erlang speed, and would love to hear your feedback on this.
the idea is quite simple: queuing all messages sent from a node to
another node, and sending them by groups. the whole concept is
therefore to have a gen_server, called 'qr', running on every node
where message passing takes place. a single process from a node A
sends a message to a process of node B, being relayed by the two 'qr':
process on node A => 'qr' on node A => 'qr' on node B => process on
node B.
this is something that is generally taken care of at lower level of
implementations (tcp), with algorithms such as nagle, but i've decided
to try a pipe/queuing mechanism at erlang application level too, to
see if i could get any improvements.
a detailed explanation of the tests and benchmarks that i've performed
are available here: http://www.ostinelli.net/boost-message-passing-between-erlang-nodes
and a new updated code is available here: http://www.ostinelli.net/wp-content/uploads/2009/06/erlang_mq_boost_2.zip
for you to try it out on your machine.
thanks to ulf wiger writing a note on the post above, i've also tried
out the undocumented dist_nodelay kernel option, which did provide
improvements, but still far from the ones i'm experimenting with the
'qr' pipe mechanism.
please note that i'm in no way pretending to have found something
great and new. i'm posting this here just because reactions to my
linked post have mainly pointed towards telling me to perform
additional tcp optimization, but i've personally been unable to find a
way to reproduce the results of the 'qr' pipe mechanism by mere tcp
optimization. also, the benchmarking test that i've used is quite
specific, since it sends 200,000 messages in parallel first, which are
then processed sequentially on the recipient node. this is because
this test reflects a real need of an application i'm developing, where
loads of client threads have to go through a bottleneck of a single
registered process.
therefore, any opinions on this are warmly welcome, so as to have a
better understanding on what is going on and hopefully produce better
software.
thank you in advance those of you who took the time to read till
here, and more to the ones who will [hopefully] give me some feedback.
cheers,
r.
More information about the erlang-questions
mailing list