[erlang-questions] Performance issue with snmp

Sat Jan 21 23:16:31 CET 2012

Sorry it took me so long to get back on this one. I got pulled onto
something else for a while.

On Thu, Jan 12, 2012 at 9:20 AM, Micael Karlberg <bmk@REDACTED> wrote:
> On 01/11/2012 07:14 PM, Phillip Toland wrote:
>> From reading the code it looks like sync_get2 is exactly the same code
>> as sync_get. I just ran a quick benchmark and they don't appear to
>> perform any differently.
>
> I did not mention it because it is different, but because its the
> "new" version, which provides an easier way to provide arguments (the
> timeout time among other things).

Now that I look more closely at the signatures I see that. Interesting.

> The obvious way *around* this problem is to increase the timeout time as
> you increase the load (proportional to N).

I tried that and the results were unsatisfactory. I ended up with the
timeout set as high as 60 seconds and I was still getting timeouts
with a relatively small number of concurrent requests.

> Spawning processes in Erlang is cheap, but not free. Having one
> process that issues X number of async requests and then await
> X replies is cheaper than spawning X processes that each issues
> one request and await the reply (even if the waiting is hidden
> within snmpm).

Right, which is why it didn't make sense to me that I would see any
difference between using sync_get and async_get.

> But your test will show if there this actually makes a difference.

And indeed there was not any difference.

> The way the manager is designed, spawning a process for each
> request does very little to distribute load, since all that
> process does is (basically) to wait for a reply. The actual processing
> is done in the snmpm_server and snmpm_net_if processes.

Yes, and profiling my simple demo application showed that about 60% of
the time for each call was spent in snmp_server and about 40% in
snmpm_net_if. I was able to identify some hotspots with the profiling
(a lot of time is spent in ETS), but nothing that I feel is a smoking
gun for the problem I am seeing.

I am coming to the conclusion that the root problem is a mismatch in
requirements. It seems that snmpm was built on a few assumptions that
are counter to what I am trying to do. For example, that there are a
relatively small number of agents with extended interactions with each
agent. I, on the other hand, have a huge number of agents and my
interactions will be few and brief. Also, snmpm doesn't seem to be
designed with concurrency in mind and concurrency is crucial for what
I am doing. You cannot, for example, start multiple snmpm applications
on different ports to increase concurrency. So that leads me to the
conclusion that what I need to do is build an snmp manager that is
geared towards my goals and stop trying to fit a square peg into a
round hole. I don't relish the idea of building my own manager, but it
is hard to escape the conclusion that I cannot make snmpm meet my
performance requirements without fundamentally changing the
architecture.

-- 
~p