optimizing an asynchronous architecture

Tue Jul 7 22:17:27 CEST 2009

Suppose I'm tasked with building a server that can broadcast a message  
to 20K subscribers to a topic. No, I do not want to use RabbitMQ since  
my implementation fits into 2-3 of pages of code.

I'm using gen_server:cast all around and the trip from publishing a  
message to gen_tcp:send is through 5 gen_servers.

#1 is the "transport process" that receives the message and forwards  
it to #2, a locally registered "topic manager".

I need the topic manager intermediary because the names of locally  
registered servers must be atoms and so I cannot locally register a  
server for each topic. The topic manager gen_server keeps track of  
locally registered "subscriber servers" (#3) which keep track of  
subscribers for each topic. Both #2 and #3 use dicts to map topics to  
processes and keep track of subscriber processes respectively.

#4 is a client proxy that forwards the message back to the transport  
(#1) which pushes it out to the client socket.

How do I go about cutting the message trip time in half?

I tried using fprof but the output is mostly gen_server:loop,  
proc_lib:sync_wait, etc. I cannot disable certain functions since  
fprof always calls erlang:trace(PidSpec, true, ...), that is I can't  
disable tracing for unwanted.

What I would like, ideally, is to timestamp the message as it goes  
through the 1-2-3-4-1 pipeline and calculate the deltas once I'm back  
at #1. Any other suggestions?

	Thanks, Joel

---
Mac hacker with a performance bent
http://www.linkedin.com/in/joelreymont