[erlang-questions] System monitoring and logging

Wed Mar 19 18:17:08 CET 2008

About making monitoring built-in instead of an add-on:

Yaws can be a great monitoring tool - which can then be coupled  
upstream with a bit of scripting glue to the monitoring front-end  
preferred at the site in question, SNMP or other.

An approach I've used is to generate a lot of the app, supervisor,  
etc. boilerplate for each new service coming up. Included in the  
boilerplate is a yaws server on each node, with standardized layout  
containing a link for each custom Erlang application on the node. The  
supervisor and main application boilerplate include code to  
initialize an ets table for tracking stats and yaws pages for  
exposing those stats. The generic yaws page includes node start time,  
uptime, nodes list, and a view of the ets stats table. These features  
then become available with no added work when a new application is  
under development.

It's easy to add specific parameters to the stats table that track  
transaction count, error count, etc. These are then automatically  
visible in the yaws page for the application.

A large platform consists of several separate Erlang clusters, each  
of which has at least two replicating mnesia nodes which store  
configuration data (also viewable and settable via yaws boilerplate).

A management station can interrogate the core mnesia nodes to see  
what else is in a cluster, then collect stats from all active nodes  
and their application stats.

By the way, it was surprising how useful the template/boilerplate  
system can be. When a developer comes up with a good idea on how to  
initialize or manage or update a node, you audit the new approach and  
modify for portability, then merge it into the code generation  
system. Your nodes get smarter and smarter over the months as they  
roll out. Kind of like Erlang programming in general, one of the  
winning features of the templating approach is the gradual learning  
curve. Start with something simple that works. It immediately becomes  
useful and justifies the effort. Extend ad lib.

On Mar 19, 2008, at 8:54 AM, Peter Mechlenborg wrote:

> Hi
>
> For the last 18 month or so I have been working on an interesting
> project written in Erlang.  Over the last months it has become clear
> to me that we need a more structured way of monitoring our systems.
> Right now we basically just have a log file with lots of different
> information.  I'm starting to realize that monitoring and visibility
> are important properties that should be an integrated part of our
> architecture; not an add-on.  I also think this applies to almost all
> server systems, especially those with high demands on fault
> tolerance, so this issue must have been solved many times before in
> Erlang, or am I wrong here?
>
> We have started looking into SNMP, and this seems promising, even
> though it seem a bit old (I get the impression that SNMP where hot 10
> years ago, but is kind of phasing out right now.  Is this correct?)
> and rigid.  I have not been able to find any alternatives to SNMP, do
> there exist any?  I would really like some feedback on how you guys
> handle monitoring and logging on the systems you develop and operate,
> do you use SNMP, some other framework, nothing, or something home  
> grown.