[erlang-questions] Performance issue with snmp

Wed Jan 11 19:14:47 CET 2012

On Wed, Jan 11, 2012 at 5:14 AM, Micael Karlberg <bmk@REDACTED> wrote:
> On 01/10/2012 10:25 PM, Phillip Toland wrote:
>> I am working on an application that uses SNMP heavily. This
>> application needs to make hundreds of SNMP queries during normal
>> operation and, this being Erlang, I tried to speed things up by
>> issuing queries in parallel. Unfortunately, the performance of snmpm
>> decreases considerably as you add more concurrent requests with
>> requests timing out above a certain threshold.  To illustrate the
>> problem I have created a minimal code sample to reproduce the problem.
>> Since snmpm requires a little configuration I have uploaded the code
>> and config files to Github: https://github.com/toland/snmp_perf
>>
>> If I run snmp_perf:sync() with any number performance is very stable
>> with every request returning in about 0.002 seconds. I am testing with
>> a device on the network so this is reasonable performance. However,
>> running snmp_perf:async(10) results in requests taking on average 0.01
>> seconds and the performance continues to get worse until about 80
>> concurrent requests when I start getting timeouts and the average
>> request time is about 0.08 seconds. Increasing the number of requests
>> results in longer request times and more timeouts.
>>
>> My question is this: is this performance degradation the expected
>> behavior or does it represent a bug in snmpm? It looks like a bug to
>> me but I wanted to get a second opinion. If this is the expected
>> behavior does anyone have a suggestion for ameliorating the problem in
>> a highly concurrent application? I have considered implementing a
>> queue and worker pool as a way to limit the concurrent requests to
>> snmpm, but was hoping there was a better solution.
>
>
> I would not go as far as calling it a bug. Possibly there is a
> design flaw.

OK. I am fine with either term. The bottom line is that I did not
expect that kind of slowdown.

> But, looking at your code, I see that you are using
> snmpm:sync_get/3 (there is also snmpm:sync_get2/3,4) for both the
> synchronous and asynchronous calls.

>From reading the code it looks like sync_get2 is exactly the same code
as sync_get. I just ran a quick benchmark and they don't appear to
perform any differently.

> I am wondering why you are not
> using snmpm:async_get/3 (or snmpm:async_get2/3,4) for the
> asynchronous calls. Did you find a problem with that?

My application exposes an API. Each call to the API results in several
SNMP queries which are done in sequence since later queries rely on
the results of earlier queries. For example, I am talking to Cisco
switches and I want to know which VLAN is assigned to each port. I
first issue a query for the list of ports then issue a query for the
VLAN information on each port. Within the context of a single API
call, sync_get appears to be the appropriate way to go. I spent quite
a bit of time with the snmpm code and all SNMP operations are
asynchronous under the covers. snmpm_server simply sticks a record
into ETS for each SNMP request then returns noreply to the caller.
When the request completes it looks up the request in ETS and sends
the reply to the caller. At least, that is my understanding of the
code. Since I really do need synchronous requests within the context
of a single API call I would have to do something very similar to this
in my implementation of the snmpm_user behavior. I didn't want to
reinvent the wheel and the code in snmpm_server looks sound to me. I
think that the terms "sync" and "async" have perhaps muddied the
waters a bit. To be clear, what I need to do is make multiple
concurrent synchronous SNMP requests.

All that having been said, I will try to benchmark async_get this
afternoon to see if I can make that work. It is going to be hard to
get an apples-to-apples comparison, but if I can get the performance I
need that is all I really care about.

> Also, you could ask for more than one oid for every request. There
> is after all quite a bit of overhead for every request, so asking
> for more then one oid at a time saves a lot of processing.
> As Dimitry wrote, you could also use get-bulk.

get-bulk and requesting multiple OIDs doesn't really help in my
situation since the problem I am having is with multiple concurrent
requests to different devices. Talking to one device is fine and shows
acceptable performance, it is only when I need to talk to a lot of
devices at once that I start to see problems. Since the simple code
sample I wrote showed significant problems with 80 concurrent
requests, and that is a reasonable level of concurrency to expect in
my application, I do not believe that making fewer calls to individual
devices will solve my problem.

~p