[erlang-questions] high-volume logging via a gen_server

Joe Armstrong erlang@REDACTED
Mon Oct 4 22:08:19 CEST 2010


disk_log was designed to do this

> erl -man disk_log

NAME
       disk_log - A disk based term logging facility

DESCRIPTION
       disk_log  is  a disk based term logger which makes it possible to effi-
       ciently log items on files. Two types of logs are supported, halt  logs
       and  wrap logs ....


disk_log is *unreasonably* fast :-)

/Joe



On Mon, Oct 4, 2010 at 1:57 PM, Dan Kelley <djk121@REDACTED> wrote:
> I'm relatively new to erlang.  I've been working on a simple router which
> connects to external servers which speak different protocols.  When the
> router gets a packet, it decodes it, figures out the destination channel,
> encapsulates it in that channel's required format, and sends it along.  This
> all works fine.
>
> I have a pretty standard-looking supervision tree setup for the processes
> which make up the router app.  One of the processes is a gen_server which
> handles the logging for the overall app.  It provides a simple interface
> (log:info(), log:debug(), etc)  to the app.  When clients use that
> interface, they end up using gen_server:cast to send the logging process a
> log line.  I've moved as much of the CPU load as I can out to the functional
> interface so it's paid by the callers and not the logging process.
>
> Inside the logging gen_server, I'm not doing anything smarter than using
> io:format to write to disk:
>
> logit(Timestamp, FromPid, RequestId, LevelStr, Category, MessageKey,
> LogString, State) ->
>    io:format(State#state.log_device,
>      "~s,~s,rock,~s,~p,~s,~s,~s,~s,~s~n", [Timestamp,
>   State#state.unix_pid,
>   State#state.hostname,
>   FromPid,
>   LevelStr,
>   RequestId,
>   Category,
>   MessageKey,
>   LogString]).
>
> When I do performance tests of the overall app, it's invariably the logging
> process which is the limiting factor of the overall system.  Usually what
> happens is that the mailbox for the process accumulates several hundred
> thousand messages, which causes the size of the VM to bloat until the host
> starts swapping.  (I understand that I could use gen_server:call to make the
> logging synchronous, but that'd slow down all of the transactions, which I'd
> like to avoid if I can.)
>
> I'm pretty sure that the underlying disk is not saturated - I'm just not
> getting that many Mb of log data.  My guess is that the single erlang
> process that the logging gen_server is using just can't both read from its
> mailbox and write to disk fast enough to keep up.
>
> So, what are good strategies to cope with a large incoming volume of
> messages that all need to wind up in the same logfile?  Is there a more
> efficient way to write to disk than the simple io:format() call than I'm
> using above?  What's a good way to parallelize the logging over multiple
> processes but keep all of the information in one file?
>
> Thanks,
>
> Dan
>


More information about the erlang-questions mailing list