[erlang-questions] heart restarting erlang node

Sat Jun 26 00:37:11 CEST 2010

tom kelly <ttom.kelly@REDACTED> wrote:

tk> We've found this post from Serge Aleynikov which we're
tk> investigating:
tk> http://www.erlang.org/pipermail/erlang-questions/2006-December/024365.html

tk> But I'm not yet sure it's the same issue. This can cause heart to
tk> restart our system but only after memory usage was sustained around
tk> 90% for 5-10 minutes which wasn't the case for all of our restarts.

Tom, if your Erlang process is causing your OS to page VM to/from disk,
then all expectations of soft realtime performance will be thrown out
the window.  If the VM tries to do something simple like "char foo =
*(some_pointer)", and if some_pointer points to a page that isn't
resident in RAM, that thread will wait a *long* time before progress can
be made again.  Typically you've got 1 scheduler thread per CPU, but if
your working set isn't resident in RAM, you'll quickly block all
scheduler threads...

... and then when it comes time to answer a heartbeat, you won't do it
in time, and you'll be killed because you're too !@#$! slow.

If you're using Linux, crank the /proc/vm/*swappiness* (I forget the
exact path) down to 0.  Many kernels (RedHat comes to mind) use 60,
which is not what you want a snappy server to do.

If you can't blame your OS for moving your VM's pages to RAM, you'll
have to blame yourself: use less data or buy more RAM.  :-)

-Scott