[erlang-questions] node to node message passing

Mon Sep 13 22:41:42 CEST 2010

Hi Scott and Joe

Sure, let us start with the disconnection, even though there is more to it than that. 

Two mac computers on a local wireless network. Slow connection,  using the mac activity I see a transfer speed of around 400kB/s ~ 3Mb/s
But hang on, this is a fundamental issue, I believe.  It is just more clear at slow connections.

Start two nodes, one on each computer

erl -name  'node@REDACTED'
erl -name  'node@REDACTED'

connect them with net_adm:ping. nodes() gives the right result.

Create 50MB binary. 

(node@REDACTED)7> B =  binary:copy(<<"b">>, 50000000).

Send the binary to the registered shell at the other node.
The shell is registered as ten.

(node@REDACTED)7> {ten, 'node@REDACTED'} ! B.

Watch the network and see transfers of around 300-440 kB/s.

On the other node

(node@REDACTED)12> now().
{1284,408613,791811}

(node@REDACTED)13> 
=ERROR REPORT==== 13-Sep-2010::22:11:27 ===
** Node 'node@REDACTED' not responding **
** Removing (timedout) connection **

(node@REDACTED)13> now().
{1284,408693,688267}

The disconnect happened after around 1 minute, which is the default net_ticktime. 

Let us check this assumption.

(node@REDACTED)14> net_kernel:set_net_ticktime(5).
change_initiated
(node@REDACTED)15> net_kernel:get_net_ticktime(). 
{ongoing_change_to,5}
(node@REDACTED)16> net_kernel:get_net_ticktime().
{ongoing_change_to,5}
(node@REDACTED)17> net_kernel:get_net_ticktime().
5

The net_ticktime is now 5 seconds.

(node@REDACTED)18> now().                         
{1284,408904,738430}
(node@REDACTED)19> 
=ERROR REPORT==== 13-Sep-2010::22:15:11 ===
** Node 'node@REDACTED' not responding **
** Removing (timedout) connection **

(node@REDACTED)19> now().
{1284,408913,335812}

8 seconds, but I was very slow at typing now().
Try again.

(node@REDACTED)20> now().
{1284,408927,35639}
(node@REDACTED)21> now().
=ERROR REPORT==== 13-Sep-2010::22:15:32 ===
** Node 'node@REDACTED' not responding **
** Removing (timedout) connection **

=ERROR REPORT==== 13-Sep-2010::22:15:32 ===
The global_name_server locker process received an unexpected message:
{{#Ref<0.0.0.155>,'node@REDACTED'},true}

{1284,408932,778710}

5 seconds as expected. Anyway, net_tick is not precise. It could have been 4-6.

The file was 50MB, but I could have made it smaller, obviously.

But the real point is not just the disconnect. The real point is that messages block others.

I am not giving the code here, but you can easily reproduce it,  if it is reproducible :)

Start two processes on 'node@REDACTED' say two1 and two2.
Start two processes on 'node@REDACTED;, say ten1 and ten2.

Set net_ticktime to a very high number.

Send the 50 MB binary or so from two1 to ten1.
Shortly after send a 10 byte message from two2 to ten2.

Let ten2 and ten2 write to io:format when they receive a message.

Nothing happens for a long time,  the network is busy, and then both ten1 and ten2 write their output.

But does anyone actually know? Do messages queue up like this test seems to indicate.

Cheers,

Morten.

On Sep 13, 2010, at 9:55 PM, Joe Armstrong wrote:

> On Sun, Sep 12, 2010 at 12:48 PM, Morten Krogh <mk@REDACTED> wrote:
>>  Hi Erlangers.
>> 
>> During some test with node to node communication, I sent a large binary from
>> a process on node A
>> to a process on another node, node B. I also sent some smaller messages from
>> other processes on node A to other
>> processes on node B. It turned out that the large message blocked the later
>> messages. Furthermore, it even blocked
>> the net tick communication, so node A and B disconnected from each other
>> even though the large message was being transferred!
> 
> Just for clarification could you say what you mean by "large", "small"
> etc. -  I have no idea what this means - it might mean 10's of MBytes
> it might mean GBytes - without knowing I have no idea as to how
> realistic your expectations are.
> 
> /Joe
> 
>> 
>> After looking a bit around, I have come to the understanding that Erlang
>> uses one tcp connection between two nodes, and messages are sent
>> sequentially from the sending node A to the receiving node.
>> 
>> If that is correct, I think some improvements are needed.
>> 
>> The problem to solve is basically that small messages, including the net
>> tick, should get through more or less independently of
>> the presence of large messages.
>> 
>> The simplest would be to have several connections, but that doesn't fully
>> solve the problem. A large message will still take up
>> a lot of the hardware bandwidth even on another tcp connection.
>> 
>> My suggestion is something like the following.
>> 
>> For communication between node A and node B, there is a process (send
>> process) on each node, that coordinates all messages. The send process
>> keeps queues of different priorities around, e.g., a high priority, medium
>> priority and low priority. Messages are split up into fragments of
>> a maximum size. The receiver(node B) send process assembles the fragments
>> into the original message and delivers it locally to the
>> right process. The fragments ensure that no single transfer will occupy the
>> connection for very long.
>> There will be a function send_priority where the user can specify a
>> priority. The usual send will default to medium, say.
>> Net tick will use high priority, of course. Small messages that are needed
>> to produce a web application response can have high priority. File transfers
>> for backup purposes can have low priority.
>> The send process then switches between the queues in some way, that could be
>> very similar to context switching priorities.
>> 
>> More advanced, the send processes could occasionally probe the connection
>> with packets to estimate latency and bandwidth. Those figures could then be
>> used
>> to calculate fragment sizes. High bandwidth, high latency would require
>> large fragments. Low bandwidth, low latency small fragments for instance.
>> There could even be a function send_estimated_transfer_time that sends a
>> message and has a return value of estimated transfer time, which could be
>> used in
>> a timeout in a receive loop.
>> 
>> 
>> I have actually implemented my own small module for splitting messages into
>> fragments, and it solves the issues; net tick goes through, and small
>> messages can overtake large ones.
>> 
>> There is of course an issue when the sending and receiving process is the
>> same for several messages. Either the guaranteed message order should be
>> given up, or the
>> coordinators should keep track of that as well. Personally, I think
>> guaranteed message order should be given up. Erlang should model the real
>> world as
>> much as possible, and learn from it. In the real world, two letters going
>> from person A to person B, can definitely arrive in the opposite order
>> of the one in which they were sent. And as node to node communication will
>> be over larger and larger distances, it is totally unnatural to require
>> a certain order.
>> 
>> I am relatively new to Erlang and I really enjoy it. Kudos to all involved!
>> 
>> Cheers,
>> 
>> Morten Krogh.
>> 
>> 
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>> 
>> 
> 
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>