[erlang-questions] The upcoming leap second

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Thu May 14 10:10:32 CEST 2015


On Wed, May 13, 2015 at 3:39 PM, Youngkin, Rich <
richard.youngkin@REDACTED> wrote:

> here's an upcoming leap second on June 30th.  There's a bit of buzz about
> how it affects Linux and Java, as well as problems encountered in 2012 [1].
>

In addition to what Rickard wrote:

The two major problems to look out for is repetition and precision.
Repetition happens because the POSIX clock doesn't understand leap seconds,
so it repeats a second. If UTC is 58, 59, 60, 00, ... then POSIX will be N,
N+1, N+2, N+2, ... Now, if you use the equivalent of `os:timestamp()` (Pre
18) in the 60 and 00 seconds, then you may get the wrong order. Say you
call os timestamp twice:

TS1 = os:timestamp(),
...
TS2 = os:timestamp(),

You expect TS1 < TS2, but when time repeats, you may get fractions of a
second and suddenly this invariant breaks. For example if TS1 = {X, Y,
700000} and TS2 = {X, Y, 200000}. It leads to all kinds of trouble if you
rely on the time ordering in your system, and such errors can sometimes
cascade through subsystems creating an avalanche of failure ultimately
bringing the system down.

In Pre 18 systems, erlang:now() performs what is called "smearing"
nowadays, so it is guaranteed to be monotonic and unique. This means the
above repetition problem doesn't happen. From 18.x and onwards, the new
Time API gives you far more insight in what happens to the time in the
system, so you are able to program your own solution, which is correct for
your problem. Also note that Google altered their NTPd's to perform
smearing systemwide for clusters where they knew it was not a problem.

The other problem is precision. Some NTP daemons can't cope with leap
seconds, so when one happens, they are "kicked" and loses time precision.
Smearing also alters the clock speed, so 1000ms could suddenly be 1010ms or
1001ms in your system. For some systems, where high-resolution timing is a
necessity, this is trouble. Air Traffic control needs high precision
because planes move 300m in 1 second. The same is true for high speed
trains. Manufacturing plants some times needs high precision time keeping
because of the work they do. Systems can suddenly be off from each other by
up to a second, and this can end up in disaster.

Erlang/OTP 18.x decouples monotonic time from the system time. This means
you can use monotonic time when you need the high resolution/precision
timing. Using time for event ordering is usually a programming mistake
because leap seconds violate the invariant that time is always moving
forwards. Also, there are subtle bugs to look out for: one, distributed
systems will never be able to use time as a resolver for ordering, unless
it is known what drift there are on the clocks. Google's Spanner system
employs GPS clocks in the data centers to make sure time is accurate. And
then they can make guarantees about a time window across data centers. The
other bug is related to what Justin Sheehy so succinctly wrote in the
sentence, "There is no now". Imagine the following code:

TS = os:timestamp(),
<Exprs>
f(..., TS, ...)

If Erlang preempts you in <Exprs>, or the kernel does the same, then in
principle, any amount of time might happen between the def of TS and it's
use inside 'f'. That is, any timestamp you draw always "lags behind"
reality by some value ε, and in some cases this ε varies by quite a lot in
a non-deterministic way. If two Erlang processes both draw a timestamp in
this fashion and only one of them gets blocked, then the event ordering
might be inverted from what you expect. Coping with this is a real problem.
In cooperatively scheduled systems, Node.js for instance, this is less of a
problem because a task always runs to completion, even after a kernel
preemption. But in a fully preemptive system, like Erlang, Go or
Haskell[0], this is something to look out for.

[0] Go preempts on a function call boundary nowadays. So it won't be
preempted in a for (;;) { ... } loop. Haskell preempts on memory
allocation. In practice, both models are fully preemptive. Erlang also
preempts on a funcall boundary, but its functional nature means that the
end of any basic block of execution has a function call.


-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150514/b315681a/attachment.htm>


More information about the erlang-questions mailing list