os_mon & alarm_handler in R10B-10

Serge Aleynikov serge@REDACTED
Tue Mar 28 16:22:06 CEST 2006


Micael,

It is not directly apparent that these functions are optional for SNMP 
tables (especially for the OS_MON and OTP_MIBS applications, which 
happended to expect the data to be stored in mnesia tables).  If by 
'optional' you mean that it's up to the programmer to decide whether to 
implement them or not, it's one thing, but if the programmer decides to 
implement them, then, IMHO,  errors in these functions should not be 
ignored.

The docs doesn't say anything about ignoring exceptions raised in the 
new/delete functions:

http://www.erlang.org/doc/doc-5.4.13/lib/snmp-4.7.1/doc/html/snmp_instr_functions.html#9

   9.1.1 New / Delete Operations

   ...

   For tables:

   table_access(new [, ExtraArg1, ...])
   table_access(delete [, ExtraArg1, ...])

   These functions are called for each object in an MIB when the MIB is
   unloaded or loaded, respectively.


Moreover depending on the startup order of the snmp & mnesia apps listed 
in the release file, the functionality of os_mon and otp_mibs will be 
different as was illustrated in my former email.  I suggest that this 
either needs to be documented, or better fixed by escalating the raised 
exception.

So, if calling os_mon_mib:load(snmp_master_agent) or 
otp_mib:load(snmp_master_agent) with snmp applicaion started but without 
mnesia running, the functions don't fail, and if mnesia is started right 
after these calls, this makes it pretty difficult to figure out why SNMP 
manager is reporting errors in SNMP queries, as all applications seem to 
be running as expected on the SNMP agent node.

Regards,

Serge

Micael Karlberg wrote:
> Hi,
> 
> The new (and delete) function is an optional one. Also
> there is no defined return value for this function.
> Therefor it's not worth the effort to try to figure if
> the result is ok or not.
> 
> /BMK
> 
> Serge Aleynikov wrote:
> 
>> Gunilla,
>>
>> I believe there might be another bug in SNMP revealed by my 
>> experiments with OS_MON & OTP_MIBS.  If mnesia is started *after* the 
>> snmp agent, and the snmp agent has the mibs parameter set, an attempt 
>> to initialize mib OIDs using instrumentation functions with the 'new' 
>> operation (such as otp_mib:erl_node_table(new)), leads to an ignored 
>> exception that ideally should prevent the SNMP agent from starting.
>>
>> Release file:
>> =============
>> {release, {"dripdb", "1.0"}, {erts, "5.4.13"},
>>   [
>>     {kernel  , "2.10.13"},
>>     {stdlib  , "1.13.12"},
>>     {sasl    , "2.1.1"},
>>     {lama    , "1.0"},
>>     {otp_mibs, "1.0.4"},
>>     {os_mon  , "2.0"},
>>     {snmp    , "4.7.1"},
>>     {mnesia  , "4.2.5"}
>>   ]
>> }.
>>
>> Config file:
>> ============
>>
>> %%------------ SNMP agent configuration ----------------------
>>   {snmp,
>>      [{agent,
>>         [{config, [{dir, "etc/snmp/"},
>>                    {force_load, true}
>>                   ]},
>>          {db_dir, "var/snmp_db/"},
>>          {mibs,   ["mibs/priv/OTP-MIB",
>>                    "mibs/priv/OTP-OS-MON-MIB"]}
>>         ]
>>       }
>>      ]
>>   }
>>
>> This is a trace of the error which hides the fact that there was a 
>> problem with creation of the 'erlNodeAlloc' table:
>>
>> (<0.126.0>) call 
>> snmpa_mib_data:call_instrumentation({me,[1,3,6,1,4,1,193,19,3,1,2,1,1,1],
>>     table_entry,
>>     erlNodeEntry,
>>     undefined,
>>     'not-accessible',
>>     {otp_mib,erl_node_table,[]},
>>     false,
>>     [{table_entry_with_sequence,'ErlNodeEntry'}],
>>     undefined,
>>     undefined},new)
>> (<0.126.0>) returned from snmpa_mib_data:call_instrumentation/2 ->
>>   {'EXIT',{aborted,{node_not_running,drpdb@REDACTED}}}
>>
>> Therefore all the SNMP manager's calls to OIDs inside 'erlNodeTable' 
>> or 'applTable' tables fail.
>>
>> I can provide additional details if needed, if the information here is 
>> not sufficient.  I believe the proper action to do would be not to 
>> absorb the error in the call_instrumentation function when the 
>> Operation is 'new'.  I am providing the snippet of code where that 
>> exception is currently ignored:
>>
>> snmpa_mib_data.erl(line 1319):
>> ==============================
>> call_instrumentation(#me{entrytype = variable, mfa={M,F,A}}, 
>> Operation) ->
>>     ?vtrace("call instrumentation with"
>>         "~n   entrytype: variable"
>>         "~n   MFA:       {~p,~p,~p}"
>>         "~n   Operation: ~p",
>>         [M,F,A,Operation]),
>>     catch apply(M, F, [Operation | A]);
>> ...
>>
>>
>> Regards,
>>
>> Serge
>>
>>
>> Gunilla Arendt wrote:
>>
>>> It's a bug in os_mon, it shouldn't use get_alarms().
>>> Thanks for the heads up.
>>>
>>> Regards, Gunilla
>>>
>>>
>>> Serge Aleynikov wrote:
>>>
>>>> For now I used the following patch to take care of this issue, but I 
>>>> would be curious to hear the opinion of the OTP staff.
>>>>
>>>> Regards,
>>>>
>>>> Serge
>>>>
>>>> --- alarm_handler.erl.orig      Fri Mar 24 20:08:18 2006
>>>> +++ alarm_handler.erl   Fri Mar 24 20:19:15 2006
>>>> @@ -58,7 +58,12 @@
>>>>  %% Returns: [{AlarmId, AlarmDesc}]
>>>>  %%-----------------------------------------------------------------
>>>>  get_alarms() ->
>>>> -    gen_event:call(alarm_handler, alarm_handler, get_alarms).
>>>> +    case gen_event:which_handlers(alarm_handler) of
>>>> +    [M | _] ->
>>>> +        gen_event:call(alarm_handler, M, get_alarms);
>>>> +    [] ->
>>>> +        []
>>>> +    end.
>>>>
>>>>  add_alarm_handler(Module) when atom(Module) ->
>>>>      gen_event:add_handler(alarm_handler, Module, []).
>>>>
>>>>
>>>> Serge Aleynikov wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've been experimenting with the reworked os_mon in R10B-10, and 
>>>>> encountered the following issue.
>>>>>
>>>>> The documentation encourages to replace the default alarm handler 
>>>>> with something more sophisticated.  For that reason I created a 
>>>>> custom handler - lama_alarm_h (LAMA app in jungerl), which uses 
>>>>> gen_event:swap_sup_handler/3.
>>>>>
>>>>> I initiate that handler prior to starting OS_MON, and then start 
>>>>> OS_MON.
>>>>>
>>>>> In the latest release R10B-10, OS_MON calls 
>>>>> alarm_handler:get_alarms/0 upon startup.
>>>>>
>>>>> This causes the 'alarm_handler' event manager issue a call in the 
>>>>> alarm_handler.erl module.  However, since that handler was replaced 
>>>>> by a custom alarm handler, the gen_event's call fails with
>>>>> {error, bad_module}.
>>>>>
>>>>> gen_event always dispatches a call/3 to a specific handler module 
>>>>> passed as a parameter, e.g.:
>>>>>
>>>>> -----[alarm_handler.erl (line: 60)]-----
>>>>> get_alarms() ->
>>>>>     gen_event:call(alarm_handler, alarm_handler, get_alarms).
>>>>> ----------------------------------------
>>>>>
>>>>> Yet, if the alarm_handler handler was swapped by another module, 
>>>>> the gen_event:call will report an error, therefore crashing OS_MON.
>>>>>
>>>>> One way to resolve this problem would be to introduce another 
>>>>> exported function in gen_event:
>>>>>
>>>>> gen_event:call(EventMgrRef, Request) -> Result
>>>>>
>>>>> Can the OTP team suggest some other workaround?
>>>>>
>>>>> Serge




More information about the erlang-questions mailing list