Low latency issues (with solution and some questions)

Mon Jun 29 17:46:58 CEST 2009

Hi all!

I'm working on a system that handles a large amount of small messages
where low latency is important. The architectue is like this: There is
a process (the network process) owning a socket that controls sending
and receiving messages on this socket. This process also manages a set
of ets tables mapping incoming message replys to erlang pids and then
sending the message to the apropriate process. Another process (the
proxy process) acts as a proxy between the network process and a C
node where some computation is performed. Along this there are several
other processes running. This architecture gives an acceptable
latency.

We now want to add support for additional network protocols. I add a
process (the broker) that sits between the proxy and the network
process. For each network protocol/connection there is a network
process. The broker receives messages from several proxy processes,
manages some ets tables and sends them to the appropriate network
process. When on of the network processes receives a reply it sends it
to the broker which does some ets table lookups and sends the reply to
the apropriate proxy. Latency now increases by 50-100µs and is no
longer acceptable.

The broker only does a ets lookup before it passes the message to it's
destination, but after this lookup it does some additional processing
eg, updating ets tables and sending a message to a audit process. This
additional processing can be defered until the message has been sent
out on the socket by the network process. I'm using a 4 core machine
and the network process and the broker can run on different cpus, but
it seems like this is not the case. As I understand it, there is a
different run queue for each core and processes only migrate between
run queues 4 times/s so if the broker and the network process are in
the same run queue the ets table updates would be performed before the
message is sent out on the socket.

I verified that this is the case by doing a yield() in the broker
right after it sends out a message and setting the process priority of
the network process, the broker and the proxies to high. Now I have
the latencies back to the original levels.

What other architectures would you suggest? I was thinking about using
ets concurrently with global tables.

What other alternatives than using yield() and process priorities are
there to controlling latency in erlang systems?

Is it possible to force processes to stay in a specific run queue?

What about priority messages where a message pass would also force the
sending process to be scheduled out and the receiving process would be
scheduled in?

Thanks,
Erik Rigtorp