[erlang-questions] error_logger and the perils of asynchronicity
Thu May 14 17:38:19 CEST 2009
This seems like just another instance of the "multiple producers, one
consumer" problem that is easy to get bitten by in Erlang.
The usual party line response is, if one of your processes is getting
overloaded like this, you need to implement flow control. But probably
the OTP team would be reluctant to do that with error_logger because
most of the time (when messages are rare enough), asynchronous gives
Another way to address this problem, which I'm sure has been discussed
before, would be changes to the scheduler.
What if there were two new send operators, just like !, but with
scheduling side effects:
- a "synchronous door" send, for when you are sending a message to a
server process that will do some work for you and send a reply which you
will wait for. The scheduling change would be something like: the
server process is immediately scheduled in for the remainder of the
client process's time slice, and then the next time the client
process enters a receive, the server process gets all of the client's
time slices (as though the client were continuing to run) until it
sends a message to the client, the client exits the receive, either
process dies, etc.
- an "asynchronous door" send, for things like error_logger, and logging in
general. This would somehow give extra cycles to the server process at
some point in the future, whether or not the client process still
exists. Ideally, that would be just enough extra cycles to consume the
message on average, but the right design is tricky.
If I understand correctly, right now processes get a scheduling penalty
for sending to a process with a large message queue (large in bytes or
messages?). But that doesn't help in all situations, e.g., when new
processes are being created all the time. (It obviously didn't help in
Ulf Wiger writes:
> I recently had reason to stare at a crash dump from a
> system that had run out of memory.
> Mon May 11 20:31:02 2009
> Slogan: eheap_alloc: Cannot allocate 191315400 bytes of memory (of type
> System version: Erlang (BEAM) emulator version 5.6.5 [source] [smp:8]
> eads:10] [kernel-poll:true]
> In other words, we ran out of memory as the garbage collector
> tried to find a place to allocate 191 MB of heap space.
> Looking further down in the crash dump (searching for "Garbing"),
> we find the "guilty" process:
> State: Garbing
> Name: error_logger
> Spawned as: proc_lib:init_p/5
> Last scheduled in for: io_lib_pretty:print_length_list1/3
> Message queue length: 219
> Stack+heap: 38263080
> OldHeap: 47828850
> Heap unused: 13858
> OldHeap unused: 47828850
> Program counter: 0xb13861a8 (io_lib_pretty:print_length_list1/3 + 4)
> CP: 0x00000000 (invalid)
> Surprisingly, it is the error_logger in this case,
> which sports a hefty 38 MB of new heap and 48 MB of
> old heap. It also has 219 messages in the message queue.
> We cannot know what these messages are, as far as I
> understand, since they are not present in the dump
> (and, being unprocessed in the msg queue, certainly
> haven't been written to disk).
> Personally, I feel that this is a good illustration
> of how things can go wrong when one relies on asynch
> send as a way to not delay the working processes too
> much. Sometimes they really do need to be held back,
> and then, we don't have any means to do so.
> I'm doubtful whether it is really a good idea to just
> cast messages to the error_logger. If we do, perhaps
> the error_logger should have a strategy for throwing
> stuff away if it gets severely backed up. After all,
> by killing the entire system, the information is lost
> Ulf W
> Ulf Wiger
> CTO, Erlang Training & Consulting Ltd
> erlang-questions mailing list
More information about the erlang-questions