[erlang-questions] System monitoring and logging
Wed Mar 19 18:17:08 CET 2008
About making monitoring built-in instead of an add-on:
Yaws can be a great monitoring tool - which can then be coupled
upstream with a bit of scripting glue to the monitoring front-end
preferred at the site in question, SNMP or other.
An approach I've used is to generate a lot of the app, supervisor,
etc. boilerplate for each new service coming up. Included in the
boilerplate is a yaws server on each node, with standardized layout
containing a link for each custom Erlang application on the node. The
supervisor and main application boilerplate include code to
initialize an ets table for tracking stats and yaws pages for
exposing those stats. The generic yaws page includes node start time,
uptime, nodes list, and a view of the ets stats table. These features
then become available with no added work when a new application is
It's easy to add specific parameters to the stats table that track
transaction count, error count, etc. These are then automatically
visible in the yaws page for the application.
A large platform consists of several separate Erlang clusters, each
of which has at least two replicating mnesia nodes which store
configuration data (also viewable and settable via yaws boilerplate).
A management station can interrogate the core mnesia nodes to see
what else is in a cluster, then collect stats from all active nodes
and their application stats.
By the way, it was surprising how useful the template/boilerplate
system can be. When a developer comes up with a good idea on how to
initialize or manage or update a node, you audit the new approach and
modify for portability, then merge it into the code generation
system. Your nodes get smarter and smarter over the months as they
roll out. Kind of like Erlang programming in general, one of the
winning features of the templating approach is the gradual learning
curve. Start with something simple that works. It immediately becomes
useful and justifies the effort. Extend ad lib.
On Mar 19, 2008, at 8:54 AM, Peter Mechlenborg wrote:
> For the last 18 month or so I have been working on an interesting
> project written in Erlang. Over the last months it has become clear
> to me that we need a more structured way of monitoring our systems.
> Right now we basically just have a log file with lots of different
> information. I'm starting to realize that monitoring and visibility
> are important properties that should be an integrated part of our
> architecture; not an add-on. I also think this applies to almost all
> server systems, especially those with high demands on fault
> tolerance, so this issue must have been solved many times before in
> Erlang, or am I wrong here?
> We have started looking into SNMP, and this seems promising, even
> though it seem a bit old (I get the impression that SNMP where hot 10
> years ago, but is kind of phasing out right now. Is this correct?)
> and rigid. I have not been able to find any alternatives to SNMP, do
> there exist any? I would really like some feedback on how you guys
> handle monitoring and logging on the systems you develop and operate,
> do you use SNMP, some other framework, nothing, or something home
More information about the erlang-questions