[erlang-questions] Any canonical bandwidth benchmarks for clusters?
Tue Dec 2 22:39:20 CET 2008
>> I'm testing some code that is showing terrible bandwidth
>> on our 40+ node Infiniband cluster (7 MB/sec for a ring benchmark!).
>> This code is using the IP over IB interface and send big (1+ MB)
>> messages around (binaries and non-binaries). MPI code runs
>> at good rates on this cluster.
> This is probably a really stupid question, but how are you measuring
> the bandwidth? The ring benchmark is only going to measure the
> bandwidth of one node to another node at any given time. What you may
> want is a mesh network where each node receives a single message and
> then passes that off to another random node. This way half the nodes
> can be talking to the other half and your bandwidth should go through
> the roof.
> It just seems that a ring benchmark is the wrong benchmark to use. But
> I'm no Erlang guru.
Thanks for the reply, Timothy.
I measure the total time around the ring and total payload
as the number of hops times the message size. So, this
accounts for receive buffering and actually touching the
message. It's synchronous but should show decent bandwidth
(it's essentially point-to-point a bunch of times synchronously).
In fact, a 2-node ring is a ping-pong benchmark.
I also have some other benchmarks (all-pairs, all-to-all)
that also perform slowly but aren't sufficiently clean to
post. And you are correct, the ring is probably the least
representative of the messaging patterns we use. It's just
what I had laying around that was debugged and worked.
It's possible that we have a system problem with the
Erlang run-time interface to the OS that we don't know
how to debug. We are pretty sure that the OS is doing
the right thing with the IP over IB drivers and the IB NIC,
but we really need to confirm that the right things are happening.
Los Alamos National Laboratory
More information about the erlang-questions