os_mon & alarm_handler in R10B-10

Serge Aleynikov serge@REDACTED
Mon Mar 27 20:32:12 CEST 2006


Gunilla,

I believe there might be another bug in SNMP revealed by my experiments 
with OS_MON & OTP_MIBS.  If mnesia is started *after* the snmp agent, 
and the snmp agent has the mibs parameter set, an attempt to initialize 
mib OIDs using instrumentation functions with the 'new' operation (such 
as otp_mib:erl_node_table(new)), leads to an ignored exception that 
ideally should prevent the SNMP agent from starting.

Release file:
=============
{release, {"dripdb", "1.0"}, {erts, "5.4.13"},
   [
     {kernel  , "2.10.13"},
     {stdlib  , "1.13.12"},
     {sasl    , "2.1.1"},
     {lama    , "1.0"},
     {otp_mibs, "1.0.4"},
     {os_mon  , "2.0"},
     {snmp    , "4.7.1"},
     {mnesia  , "4.2.5"}
   ]
}.

Config file:
============

%%------------ SNMP agent configuration ----------------------
   {snmp,
      [{agent,
         [{config, [{dir, "etc/snmp/"},
                    {force_load, true}
                   ]},
          {db_dir, "var/snmp_db/"},
          {mibs,   ["mibs/priv/OTP-MIB",
                    "mibs/priv/OTP-OS-MON-MIB"]}
         ]
       }
      ]
   }

This is a trace of the error which hides the fact that there was a 
problem with creation of the 'erlNodeAlloc' table:

(<0.126.0>) call 
snmpa_mib_data:call_instrumentation({me,[1,3,6,1,4,1,193,19,3,1,2,1,1,1],
     table_entry,
     erlNodeEntry,
     undefined,
     'not-accessible',
     {otp_mib,erl_node_table,[]},
     false,
     [{table_entry_with_sequence,'ErlNodeEntry'}],
     undefined,
     undefined},new)
(<0.126.0>) returned from snmpa_mib_data:call_instrumentation/2 ->
   {'EXIT',{aborted,{node_not_running,drpdb@REDACTED}}}

Therefore all the SNMP manager's calls to OIDs inside 'erlNodeTable' or 
'applTable' tables fail.

I can provide additional details if needed, if the information here is 
not sufficient.  I believe the proper action to do would be not to 
absorb the error in the call_instrumentation function when the Operation 
is 'new'.  I am providing the snippet of code where that exception is 
currently ignored:

snmpa_mib_data.erl(line 1319):
==============================
call_instrumentation(#me{entrytype = variable, mfa={M,F,A}}, Operation) ->
     ?vtrace("call instrumentation with"
	    "~n   entrytype: variable"
	    "~n   MFA:       {~p,~p,~p}"
	    "~n   Operation: ~p",
	    [M,F,A,Operation]),
     catch apply(M, F, [Operation | A]);
...


Regards,

Serge


Gunilla Arendt wrote:
> It's a bug in os_mon, it shouldn't use get_alarms().
> Thanks for the heads up.
> 
> Regards, Gunilla
> 
> 
> Serge Aleynikov wrote:
> 
>> For now I used the following patch to take care of this issue, but I 
>> would be curious to hear the opinion of the OTP staff.
>>
>> Regards,
>>
>> Serge
>>
>> --- alarm_handler.erl.orig      Fri Mar 24 20:08:18 2006
>> +++ alarm_handler.erl   Fri Mar 24 20:19:15 2006
>> @@ -58,7 +58,12 @@
>>  %% Returns: [{AlarmId, AlarmDesc}]
>>  %%-----------------------------------------------------------------
>>  get_alarms() ->
>> -    gen_event:call(alarm_handler, alarm_handler, get_alarms).
>> +    case gen_event:which_handlers(alarm_handler) of
>> +    [M | _] ->
>> +        gen_event:call(alarm_handler, M, get_alarms);
>> +    [] ->
>> +        []
>> +    end.
>>
>>  add_alarm_handler(Module) when atom(Module) ->
>>      gen_event:add_handler(alarm_handler, Module, []).
>>
>>
>> Serge Aleynikov wrote:
>>
>>> Hi,
>>>
>>> I've been experimenting with the reworked os_mon in R10B-10, and 
>>> encountered the following issue.
>>>
>>> The documentation encourages to replace the default alarm handler 
>>> with something more sophisticated.  For that reason I created a 
>>> custom handler - lama_alarm_h (LAMA app in jungerl), which uses 
>>> gen_event:swap_sup_handler/3.
>>>
>>> I initiate that handler prior to starting OS_MON, and then start OS_MON.
>>>
>>> In the latest release R10B-10, OS_MON calls 
>>> alarm_handler:get_alarms/0 upon startup.
>>>
>>> This causes the 'alarm_handler' event manager issue a call in the 
>>> alarm_handler.erl module.  However, since that handler was replaced 
>>> by a custom alarm handler, the gen_event's call fails with
>>> {error, bad_module}.
>>>
>>> gen_event always dispatches a call/3 to a specific handler module 
>>> passed as a parameter, e.g.:
>>>
>>> -----[alarm_handler.erl (line: 60)]-----
>>> get_alarms() ->
>>>     gen_event:call(alarm_handler, alarm_handler, get_alarms).
>>> ----------------------------------------
>>>
>>> Yet, if the alarm_handler handler was swapped by another module, the 
>>> gen_event:call will report an error, therefore crashing OS_MON.
>>>
>>> One way to resolve this problem would be to introduce another 
>>> exported function in gen_event:
>>>
>>> gen_event:call(EventMgrRef, Request) -> Result
>>>
>>> Can the OTP team suggest some other workaround?
>>>
>>> Serge
>>>
>>
> 
> 

-- 
Serge Aleynikov
R&D Telecom, IDT Corp.
Tel: (973) 438-3436
Fax: (973) 438-1464
serge@REDACTED



More information about the erlang-questions mailing list