[erlang-questions] heart restarting erlang node
Scott Lystig Fritchie
fritchie@REDACTED
Sat Jun 26 00:37:11 CEST 2010
tom kelly <ttom.kelly@REDACTED> wrote:
tk> We've found this post from Serge Aleynikov which we're
tk> investigating:
tk> http://www.erlang.org/pipermail/erlang-questions/2006-December/024365.html
tk> But I'm not yet sure it's the same issue. This can cause heart to
tk> restart our system but only after memory usage was sustained around
tk> 90% for 5-10 minutes which wasn't the case for all of our restarts.
Tom, if your Erlang process is causing your OS to page VM to/from disk,
then all expectations of soft realtime performance will be thrown out
the window. If the VM tries to do something simple like "char foo =
*(some_pointer)", and if some_pointer points to a page that isn't
resident in RAM, that thread will wait a *long* time before progress can
be made again. Typically you've got 1 scheduler thread per CPU, but if
your working set isn't resident in RAM, you'll quickly block all
scheduler threads...
... and then when it comes time to answer a heartbeat, you won't do it
in time, and you'll be killed because you're too !@#$! slow.
If you're using Linux, crank the /proc/vm/*swappiness* (I forget the
exact path) down to 0. Many kernels (RedHat comes to mind) use 60,
which is not what you want a snappy server to do.
If you can't blame your OS for moving your VM's pages to RAM, you'll
have to blame yourself: use less data or buy more RAM. :-)
-Scott
More information about the erlang-questions
mailing list