Switching Debug Info On and Off

Fri Feb 21 12:03:28 CET 2003

Siri Hansen (EAB) writes:

 > I think the solution to your problem is tracing. 

If only I'd thought of that before!

How do others handle logging? 

We've been through a few iterations of how to handle logging in our
system and it's been ugly the whole way.

In the beginning, I used error_logger together with the log_mf_h
handler to log all of the error-logger messages to a series of binary
files. There were several problems with that approach:

   1. You can't use any standard unix tools on the logs, e.g. 'tail'
      and 'grep'

   2. There is no compression. Our system has a couple of megabytes
      of flash disk for logging. If we gzip the logs, we can look 
      five or ten times further into the past.

   3. There's no straightforward way to make the logger truncate long
      log entries. Sometimes a crash results in a big binary being
      logged, thus overwriting lots of older logs.

   4. Log corruption after power failure.

   5. Selective logging, i.e. increasing the verbosity of the log
      at runtime, had was implemented by decisions spread throughout
      the system. Hard to maintain.

The next attempt was to use error_logger and write my own handler
which formatted the logs to ASCII. Once the log file got larger than a
certain size, I rename it, start a new one and compress the old one,
basically as 'logrotate' does. Solved #1, #2, #4 and #5. New problems:

   6. Formatting arbitrary terms, especially crash dumps, can cost
      a lot of CPU and memory. We don't have much of either.

   7. The error_logger has no flow control. If #6 is slow, the message
      queue growth can kill the VM (out of memory).

   8. Trying to solve #3 using io_lib:write/2 is not so great.
      write/2 limits output to a certain depth, but what I really
      want is a certain length, otherwise you get greatly varying 
      amounts of information logged depending on the term's structure:

      1> Bin = <<"five boxing wizards jump quickly">>.

      2> lists:flatten(io_lib:write(Bin, 10)).
      "<<102,105,118,101,32,98,111,120,105,...>>"

      3> lists:flatten(io_lib:write({Bin, Bin, Bin, Bin, Bin}, 10)).
      "{<<102,105,118,101,32,98,111,120,...>>,<<102,105,118,101,32,\
       98,111,...>>,<<102,105,118,101,32,98,...>>,<<102,105,118,101,\
       32,...>>,<<102,105,118,101,...>>}"

      It'd also be nice if the formatter realised the binary was a string.

   9. Selective logging implemented in an event handler works well
      and is centralised, but it means that an awful lot of log
      messages are passed around and then dropped. Even with
      all logging disabled this traffic was was eating 20% of our CPU.

The current solution is to use error_logger, my own handler, my own
formatting code (see http://www.corelatus.com/~matthias/trunc_io.erl) 
and tracing of a do-nothing log function for selective logging. 

The formatting code truncates log entries by length rather than depth
and tries to be clever when logging binaries and lists:

   {Z, _} = trunc_io:print({Bin, Bin, Bin, Bin, Bin}, 50), lists:flatten(Z).
   {<<"five boxing wizards jump quickly">>,<<"five...">>...}"

   {X, _} = trunc_io:print("almost a string" ++ [0], 100), lists:flatten(X).
   "almost a string"[0]

I think that solves all my current headaches. Remains to see what new
ones pop up. Keeping track of what's being traced for logging and
what's being traced for debugging is a candidate.

That was a long ramble. Does anyone have any better ideas?

Matthias