[erlang-questions] high-volume logging via a gen_server
Joe Armstrong
erlang@REDACTED
Mon Oct 4 22:08:19 CEST 2010
disk_log was designed to do this
> erl -man disk_log
NAME
disk_log - A disk based term logging facility
DESCRIPTION
disk_log is a disk based term logger which makes it possible to effi-
ciently log items on files. Two types of logs are supported, halt logs
and wrap logs ....
disk_log is *unreasonably* fast :-)
/Joe
On Mon, Oct 4, 2010 at 1:57 PM, Dan Kelley <djk121@REDACTED> wrote:
> I'm relatively new to erlang. I've been working on a simple router which
> connects to external servers which speak different protocols. When the
> router gets a packet, it decodes it, figures out the destination channel,
> encapsulates it in that channel's required format, and sends it along. This
> all works fine.
>
> I have a pretty standard-looking supervision tree setup for the processes
> which make up the router app. One of the processes is a gen_server which
> handles the logging for the overall app. It provides a simple interface
> (log:info(), log:debug(), etc) to the app. When clients use that
> interface, they end up using gen_server:cast to send the logging process a
> log line. I've moved as much of the CPU load as I can out to the functional
> interface so it's paid by the callers and not the logging process.
>
> Inside the logging gen_server, I'm not doing anything smarter than using
> io:format to write to disk:
>
> logit(Timestamp, FromPid, RequestId, LevelStr, Category, MessageKey,
> LogString, State) ->
> io:format(State#state.log_device,
> "~s,~s,rock,~s,~p,~s,~s,~s,~s,~s~n", [Timestamp,
> State#state.unix_pid,
> State#state.hostname,
> FromPid,
> LevelStr,
> RequestId,
> Category,
> MessageKey,
> LogString]).
>
> When I do performance tests of the overall app, it's invariably the logging
> process which is the limiting factor of the overall system. Usually what
> happens is that the mailbox for the process accumulates several hundred
> thousand messages, which causes the size of the VM to bloat until the host
> starts swapping. (I understand that I could use gen_server:call to make the
> logging synchronous, but that'd slow down all of the transactions, which I'd
> like to avoid if I can.)
>
> I'm pretty sure that the underlying disk is not saturated - I'm just not
> getting that many Mb of log data. My guess is that the single erlang
> process that the logging gen_server is using just can't both read from its
> mailbox and write to disk fast enough to keep up.
>
> So, what are good strategies to cope with a large incoming volume of
> messages that all need to wind up in the same logfile? Is there a more
> efficient way to write to disk than the simple io:format() call than I'm
> using above? What's a good way to parallelize the logging over multiple
> processes but keep all of the information in one file?
>
> Thanks,
>
> Dan
>
More information about the erlang-questions
mailing list