Logger under load

Steve Strong steve@REDACTED
Fri Aug 7 09:32:15 CEST 2020


We are having an issue with Logger when a system is under load.  The behaviours it has around async logging, switching to synchronous logging when the handler is getting behind and finally dropping messages is very useful.  However, there’s an issue when it’s in synchronous mode.  In that mode, for each log request, logger performs gen_server calls to the handler with the default 5 second timeout.  In the event of a timeout, it is handling the exception dropping the log request.  And it’s when this occurs that there’s a problem - the gen_server call to the handler is obviously still in operation, even though the caller (logger, in this case) has timed out and moved on.  At some point it completes, and a message of {Ref, ok} or {Ref, dropped} gets returned to the caller.

We are starting to have to add handle_info clauses of the form

handle_info(Msg = {Ref, OkOrDropped}, State) when is_reference(Ref), (OkOrDropped == ok orelse OkOrDropped == dropped) ->
  {noreply, State}.

to every gen_server, gen_statem etc etc - basically any process that ever makes a log request need to be able to handle these stray messages, otherwise on a loaded system logger starts nicely dropping things, which is then shortly followed by a cascade of processes failing when they receive unexpected messages.

This has been observed in Erlang 21 and 22;  we are not yet running 23, but looking at the code in logger_olp.erl, it looks like it will have the exact same problem.

Any suggestions?  One options is that logger_olp should use a child process when it is in sync mode, such that these gen_server replies can be consumed by the child without polluting the callers message queue;  having to handle these messages in *every* process is a little tedious!



More information about the erlang-questions mailing list