ERTS timeouts don't account for computer suspension

Sun May 31 00:59:42 CEST 2020

On 5/10/20 3:27 PM, Rickard Green wrote:
> On Sun, May 10, 2020 at 11:05 PM Rickard Green <rickard@REDACTED 
> <mailto:rickard@REDACTED>> wrote:
>
>     On Sun, May 10, 2020 at 9:20 PM Guilherme Andrade <g@REDACTED
>     <mailto:g@REDACTED>> wrote:
>
>         Hello list,
>
>         I've recently run into an unexpected behaviour of erlcron[1] -
>         it doesn't account for any time period the computer spends
>         suspended - scheduled events get delayed by equal amounts.
>
>         This is because it leverages gen_server timeouts which, in
>         turn, use bog-standard VM timeouts, which behave in the way I
>         described. I've confirmed[2] their behaviour by comparing time
>         elapsed until timeout according to erlang:monotonic_time/1 vs.
>         os:system_time/1.
>
>         After delving into the Erlang/OTP source code for a while I
>         realized there are two configuration options[3] that appear to
>         control the unexpected behaviour:
>         "--enable-prefer-elapsed-monotonic-time-during-suspend" and
>         "--disable-prefer-elapsed-monotonic-time-during-suspend",
>         introduced[4] in OTP 18.0.2.
>
>         The default value of the setting configured by either of those
>         two options is "no"[5] - that is, elapsed monotonic time
>         during suspend is not prefered.
>
>         I would like to know the rationale for this. Is it for:
>         - performance reasons? (say, because it's faster and server
>         hardware rarely suspends, if at all)
>         - intended behaviour? (contrary to my expectations)
>
>         I'd be thankful for your thoughts on this matter and whether
>         you think the default behaviour should be changed.
>
>         [1]: https://github.com/erlware/erlcron
>         [2]:
>         https://gist.github.com/g-andrade/508c779a931dde14c22c6e96319caa24
>         [3]:
>         http://erlang.org/documentation/doc-10.5/doc/installation_guide/INSTALL.html#configuring
>         [4]:
>         https://github.com/erlang/otp/commit/4a864c1cbe16a42f3f5190881187e3c9849e985f
>         [5]:
>         https://github.com/erlang/otp/blob/OTP-22.3.4/erts/aclocal.m4#L2355-L2358
>
>         -- 
>         Guilherme
>
>
>     Yes, for performance reasons. See release note from OTP 18.0.2
>     <http://erlang.org/download/OTP-18.0.2.README>:
>
>        OTP-12895    Application(s): erts
>
>                     *** POTENTIAL INCOMPATIBILITY ***
>
>                     Changed default OS monotonic clock source chosen at
>                     build time. This in order to improve performance. The
>                     behavior will now on most systems be that (both OS and
>                     Erlang) monotonic time stops when the system is
>                     suspended.
>
>                     If you prefer that monotonic time elapse during suspend
>                     of the machine, you can pass the command line argument
>                     --enable-prefer-elapsed-monotonic-time-during-suspend
>                     to configure when building Erlang/OTP. The
>                     configuration stage will try to find such a clock
>                     source, but might not be able to find it. Note that
>                     there might be a performance penalty associated with
>                     such a clock source.
>
>
>
>     Regards,
>     Rickard
>     -- 
>     Rickard Green, Erlang/OTP, Ericsson AB
>
>
>
> I missed this part:
>
> > I'd be thankful for your thoughts on this matter and whether you think the default behaviour should be 
> changed.
>
> I do *not* think the default should be changed. I would be very 
> surprised if it is more than a very small minority of users that are 
> interested in elapsed time during OS suspend. More or less everyone 
> would have to take this performance penalty without any benefit. I 
> don't remember any figures since this was 5 years ago, but I remember 
> that the performance penalty was quite substantial.
>
Isn't a decent use-case of the 
--enable-prefer-elapsed-monotonic-time-during-suspend configure flag for 
execution in a container, to avoid the expiration of timers without 
getting CPU time (timeouts occur too quickly, so execution becomes 
erroneous)?  With VM execution of BEAM inside, the clocksource can be 
set to kvm-clock on Linux, which avoids the problem but containers are 
unable to do that.  Has 
--enable-prefer-elapsed-monotonic-time-during-suspend been tested with 
containers (like LXC) to show more reliable operation?  I understand it 
would require the container system to be aware of the load on the system 
and play nice with the processes involved (giving them SIGSTOP and 
SIGCONT), just not sure how common that is for normal container 
operation (not manual container commands).

Thanks,
Michael


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200530/22d78aab/attachment.htm>