node to node message passing

Sun Sep 12 12:48:38 CEST 2010

  Hi Erlangers.

During some test with node to node communication, I sent a large binary 
from a process on node A
to a process on another node, node B. I also sent some smaller messages 
from other processes on node A to other
processes on node B. It turned out that the large message blocked the 
later messages. Furthermore, it even blocked
the net tick communication, so node A and B disconnected from each other 
even though the large message was being transferred!

After looking a bit around, I have come to the understanding that Erlang 
uses one tcp connection between two nodes, and messages are sent
sequentially from the sending node A to the receiving node.

If that is correct, I think some improvements are needed.

The problem to solve is basically that small messages, including the net 
tick, should get through more or less independently of
the presence of large messages.

The simplest would be to have several connections, but that doesn't 
fully solve the problem. A large message will still take up
a lot of the hardware bandwidth even on another tcp connection.

My suggestion is something like the following.

For communication between node A and node B, there is a process (send 
process) on each node, that coordinates all messages. The send process
keeps queues of different priorities around, e.g., a high priority, 
medium priority and low priority. Messages are split up into fragments of
a maximum size. The receiver(node B) send process assembles the 
fragments into the original message and delivers it locally to the
right process. The fragments ensure that no single transfer will occupy 
the connection for very long.
There will be a function send_priority where the user can specify a 
priority. The usual send will default to medium, say.
Net tick will use high priority, of course. Small messages that are 
needed to produce a web application response can have high priority. 
File transfers
for backup purposes can have low priority.
The send process then switches between the queues in some way, that 
could be very similar to context switching priorities.

More advanced, the send processes could occasionally probe the 
connection with packets to estimate latency and bandwidth. Those figures 
could then be used
to calculate fragment sizes. High bandwidth, high latency would require 
large fragments. Low bandwidth, low latency small fragments for instance.
There could even be a function send_estimated_transfer_time that sends a 
message and has a return value of estimated transfer time, which could 
be used in
a timeout in a receive loop.

I have actually implemented my own small module for splitting messages 
into fragments, and it solves the issues; net tick goes through, and small
messages can overtake large ones.

There is of course an issue when the sending and receiving process is 
the same for several messages. Either the guaranteed message order 
should be given up, or the
coordinators should keep track of that as well. Personally, I think 
guaranteed message order should be given up. Erlang should model the 
real world as
much as possible, and learn from it. In the real world, two letters 
going from person A to person B, can definitely arrive in the opposite 
order
of the one in which they were sent. And as node to node communication 
will be over larger and larger distances, it is totally unnatural to 
require
a certain order.

I am relatively new to Erlang and I really enjoy it. Kudos to all involved!

Cheers,

Morten Krogh.