[erlang-questions] Why Beam.smp crashes when memory is over?
Tue Nov 10 14:40:52 CET 2009
Joe Armstrong wrote:
> Just killing processes when they have done nothing wrong is not a good idea.
Well, it's optional, of course. :)
Imagine, OTOH, a well-tested system where memory characteristics
have been well charted for the foreseeable cases. It might be
defensible to set resource limits so that everything we expect
to see falls well within the limit, and stuff that we don't
expect might trigger the killing of some process. If this is
done on temporary processes, we should be able to accept it
as long as the number of spurious kills is low.
This is not much stranger than things that we do routinely in
- If dets or disk_log notice that a file hasn't been properly
closed, it 'repairs' the file - that is, it repairs the index.
Corrupt objects are simply discarded, not repaired.
- Replication in the AXD 301 and similar products was asynchronous
with a bulking factor. Some failure cases could lead to dropped
calls, but as long as they were few, it was acceptable.
- Some complex state machines would bail out for unexpected
sequences (I showed an example of this in my Structured Network
Programming talk at EUC). This was a form of "complexity
overload", and hugely unfair to the poor process running the
code, as it was probably not a real failure case.
- Mnesia's deadlock prevention algorithm, or indeed any deadlock
prevention algo, will restart transactions if there is even
the smallest chance of deadlock. Granted, this should be
transparent if the transaction fun is well written, but there
will be false positives, and this will affect performance.
On the other hand, there can be situations where a rogue process
gobbles up all available memory, rendering the VM unresponsive
for several minutes (e.g. due to the infamous "loss of sharing"),
or cases where a number of unexpectedly large processes "gang up"
and kill the VM in one big memory spike. Or a difficult-to-reproduce
bug that sends some application into an infinite retry situation
rendering the system unusable. In all these cases, killing off
the poor culprits, guilty or not, may well result in a less deadly
disturbance for the system as a whole.
CTO, Erlang Training & Consulting Ltd
More information about the erlang-questions