[erlang-questions] heart restarting erlang node

Sun Jun 27 18:14:55 CEST 2010

Hi Scott,
Thanks for your very useful answers!
We found some segmentation errors reported by the OS so we were starting to
think that heart wasn't the problem after all.
This is proving difficult to pin down as it's on a customers site and
happens at very irregular intervals.
For anyone else experiencing similar problems we'll inform the list if we
find a definitive solution.
//Tom.

On Sat, Jun 26, 2010 at 12:37 AM, Scott Lystig Fritchie <
fritchie@REDACTED> wrote:

> tom kelly <ttom.kelly@REDACTED> wrote:
>
> tk> We've found this post from Serge Aleynikov which we're
> tk> investigating:
> tk>
> http://www.erlang.org/pipermail/erlang-questions/2006-December/024365.html
>
> tk> But I'm not yet sure it's the same issue. This can cause heart to
> tk> restart our system but only after memory usage was sustained around
> tk> 90% for 5-10 minutes which wasn't the case for all of our restarts.
>
> Tom, if your Erlang process is causing your OS to page VM to/from disk,
> then all expectations of soft realtime performance will be thrown out
> the window.  If the VM tries to do something simple like "char foo =
> *(some_pointer)", and if some_pointer points to a page that isn't
> resident in RAM, that thread will wait a *long* time before progress can
> be made again.  Typically you've got 1 scheduler thread per CPU, but if
> your working set isn't resident in RAM, you'll quickly block all
> scheduler threads...
>
> ... and then when it comes time to answer a heartbeat, you won't do it
> in time, and you'll be killed because you're too !@#$! slow.
>
> If you're using Linux, crank the /proc/vm/*swappiness* (I forget the
> exact path) down to 0.  Many kernels (RedHat comes to mind) use 60,
> which is not what you want a snappy server to do.
>
> If you can't blame your OS for moving your VM's pages to RAM, you'll
> have to blame yourself: use less data or buy more RAM.  :-)
>
> -Scott
>