[erlang-questions] replace inet_dist TCP in favour of Tilera UDN for speed

Motiejus Jakštys desired.mta@REDACTED
Fri Oct 19 16:43:05 CEST 2012


Hi,

I am intending to run 59 Erlang VMs on Tilera64, and exchanging messages
between nodes over User Dynamic Network (UDN)[1] instead of TCP. At the
moment Erlang communicates over TCP/IP. However, for NoC systems like
this one, chip-specific ways are much much faster[*].

UDN is a packet-switched network with message ordering and delivery
guarantees. So TCP (and IP) here is pure overhead, and I want to change
distributed message passing to use UDN. For that I have to create a new
inet_tile_dist.erl (similar to inet_tcp_dist.erl) AND make sure
erlang:send/3 works fine with the new kind of weird socket.

inet_tcp_dist looks easy. I establish connection, update some global
state, and give a socket which can be used for sending the messages. The
socket is the tricky part.

I was looking how erlang:send/3 works. I started at
`send_3(BIF_ALIST_3)' in erts/emulator/beam/bif.c:2085, and traced to
`dsig_send' erts/emulator/beam/dist.c:1609[3]. However, from there I was
unable to get to the libc send. I know there are more wrappers, because
TLS and ipv6 node communication is also possible.

1. How message sending between nodes is abstracted? I would appreciate
   some higher-level explanation and brief guide to the code how the
   buffer is actually *sent*.

In ssl_distribution documentation[4] I see:

    Note however that a node started in this way will refuse to talk to
    other nodes, as no ssl parameters are supplied (see below).

2. Do I correctly imply that running a heterogeneous Erlang cluster (for
   instance, inet_tcp_dist and inet6_tcp_dist) is not possible?

If I were to implement UDN, that would put a limit on my Erlang cluster
to talk only to the same nodes on the NoC. But it would be useful to
make a larger cluster -- cluster of tilera64 clusters (tilera64 has
2x10GiB intefaces). For that to work I still need inet_tcp_dist.

Regards,
Motiejus

[*]: For raw channels, communicating data occurs at a maximum of 3.93 bytes
     per cycle[2]. Transferring 4 words to a neighbouring core takes 1 cycle.
     So if I understand correctly, this in theory means 3.93GB/s ~ 30 Gb/s
     with extremely low latency.
[1]: http://www.tilera.com/scm/docs/UG120-Architecture-Overview-TILEPro.pdf
[2]: http://www.princeton.edu/~wentzlaf/documents/Wentzlaff.2007.IEEE_Micro.Tilera.pdf
[3]: https://github.com/erlang/otp/blob/OTP_R15B02/erts/emulator/beam/dist.c#L1609
[4]: http://erlang.org/doc/apps/ssl/ssl_distribution.html



More information about the erlang-questions mailing list