Any interest in an RDS distribution?
Wed Jul 15 22:20:26 CEST 2009
Hi. I have just written my first Erlang program, which is a port driver for Infiniband (using our com layer). It is a prototype and not written as a distribution replacement. It's just a regular port driver. But, the performance was great. An echo/ping application that uses the port was measured at 20 usec per 1K buffer (one hop - round trip was 40 usec to echo back the data packet), which includes the time in and out of the Erlang port owner's process to the echo server and ping client.
I am investigating the possible use of Erlang to implement a reliable system, but we need IB speeds. So, I thought I could write a distribution layer on top of the RDS api of OFED. One of our technical founders is the original designer of RDS (Ranjit Pandit) and we have some expertise in writing IB drivers. Anyone who has tried to write directly on top of OFED knows how hard that is :-(
I was going to start with the kernel/examples/uds_dist example and modify it. If anyone has already done this, then great - share if you can. If anyone has written a different distribution, using the port driver mechanism, that uses the EPDM I'd like to talk with you. Although RDS uses datagrams, it's still reliable. I think I want to use the EPDM to discover ports. BTW, the cool thing about RDS is that it uses a single connection between two nodes, no matter how many "connections" exist between those two nodes. For us, this is quite important to conserve end point buffers in clusters that have >500 nodes connected over Infiniband (the hardware runs out of resources otherwise).
We would be happy to share this work back to the community (I say this before I've spoken with our lawyer :-) as open source. We'd have to research the license issues, etc, but first things first - let's get it working. Any advice is welcomed. Or just any cheers of support are welcomed too :-)
More information about the erlang-questions