[erlang-questions] error_logger and the perils of asynchronicity

Thu May 14 23:38:46 CEST 2009

Marc Sugiyama wrote:
> In thinking specifically about error_logger, one solution might be to
> have it check its queue and shed work when it is overloaded.  For
> example, it could shed work by:
> 
> 1. dumping unformatted messages into the log rather than formatted messages.
> 2. abandoning a message and noting in the log that it lost a message.
> 3. abandoning messages with no record.

None of these addresses messages being dumped in to the message queue of 
the process in question. Even worse, if 1000 processes crashes (with 
large error reports) within a short timespan it doesn't matter if the 
call to the error logger is synchronous or not, the messages will still 
be in the message queue of error_logger, and will cause the heap of the 
error logger to be rather large. However, these messages comes from 
somewhere, so if those processes exited, that memory would be available, 
so I don't really know if this is relevant.

However, as it has been pointed out earlier, the *reason* for the crash 
is seldom error_logger itself, even though it might be slow at 
processing them as it does it now. AFAIK there are options to dump the 
messages in binary format to a sasl log, which can then be inspected 
later. Maybe what we want to do is just to turn of pretty printing of logs.

The problem that we're seeing from this is that most of these messages 
can be lost in a crash, so we don't really know what went wrong after it 
has happened. Binary dumping is faster but you might still loose 
messages. I don't really know of any good solution for any of this, but 
making it synchronous might not have helped in the previously described 
case.

I'm not complaining, just trying to add to the discussion :)

> More generally, some library support for detecting overloaded
> processes might be helpful.  I suspect there are several strategies
> for doing so (e.g., checking the message queue length. time to process
> the message queue, etc.).
> 
> Marc
> 
> On Thu, May 14, 2009 at 9:58 AM, Ulf Wiger
> <ulf.wiger@REDACTED> wrote:
>> in all fairness I should say that the system most likely
>> would have died of other causes, but in the cases I've
>> seen, error_logger has been among the 3 largest processes
>> each time.
>>
>> Something similar to the synchronous door send was discussed
>> by Erik Stenman at EUC02 (http://www.erlang.se/workshop/2002/Stenman.pdf)
>>
>> I think there are very good reasons to remove the penalty
>> on send, making ! even more asynchronous. I think that today,
>> gen_server:call() is so fast that there may not be any need
>> for a new primitive, but it's doubtful whether gen_event
>> could be modified to use calls due to BW compatibility reasons.
>>
>> BR,
>> Ulf W
>>
>> Michael Radford wrote:
>>> This seems like just another instance of the "multiple producers, one
>>> consumer" problem that is easy to get bitten by in Erlang.
>>>
>>> The usual party line response is, if one of your processes is getting
>>> overloaded like this, you need to implement flow control.  But probably
>>> the OTP team would be reluctant to do that with error_logger because
>>> most of the time (when messages are rare enough), asynchronous gives
>>> better performance.
>>>
>>> Another way to address this problem, which I'm sure has been discussed
>>> before, would be changes to the scheduler.
>>>
>>> What if there were two new send operators, just like !, but with
>>> scheduling side effects:
>>>
>>> - a "synchronous door" send, for when you are sending a message to a
>>>   server process that will do some work for you and send a reply which you
>>>   will wait for.  The scheduling change would be something like: the
>>>   server process is immediately scheduled in for the remainder of the
>>>   client process's time slice, and then the next time the client
>>>   process enters a receive, the server process gets all of the client's
>>>   time slices (as though the client were continuing to run) until it
>>>   sends a message to the client, the client exits the receive, either
>>>   process dies, etc.
>>>
>>> - an "asynchronous door" send, for things like error_logger, and logging in
>>>   general.  This would somehow give extra cycles to the server process at
>>>   some point in the future, whether or not the client process still
>>>   exists.  Ideally, that would be just enough extra cycles to consume the
>>>   message on average, but the right design is tricky.
>>>
>>> If I understand correctly, right now processes get a scheduling penalty
>>> for sending to a process with a large message queue (large in bytes or
>>> messages?).  But that doesn't help in all situations, e.g., when new
>>> processes are being created all the time.  (It obviously didn't help in
>>> Ulf's situation.)
>>
>> --
>> Ulf Wiger
>> CTO, Erlang Training & Consulting Ltd
>> http://www.erlang-consulting.com
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions

-- 
Oscar Hellström, oscar@REDACTED
Office: +44 20 7655 0337
Mobile: +44 798 45 44 773
Erlang Training and Consulting Ltd
http://www.erlang-consulting.com/