[erlang-questions] The upcoming leap second

Youngkin, Rich richard.youngkin@REDACTED
Thu May 14 16:40:50 CEST 2015

Rickard & Jesper,

Thanks for the additional info, very useful. One item I'm concerned about
is uuid generation. I'll have to look into this a bit more. Thanks again
for your help!

On May 14, 2015 2:10 AM, "Jesper Louis Andersen" <
jesper.louis.andersen@REDACTED> wrote:

> On Wed, May 13, 2015 at 3:39 PM, Youngkin, Rich <
> richard.youngkin@REDACTED> wrote:
>> here's an upcoming leap second on June 30th.  There's a bit of buzz
>> about how it affects Linux and Java, as well as problems encountered in
>> 2012 [1].
> In addition to what Rickard wrote:
> The two major problems to look out for is repetition and precision.
> Repetition happens because the POSIX clock doesn't understand leap seconds,
> so it repeats a second. If UTC is 58, 59, 60, 00, ... then POSIX will be N,
> N+1, N+2, N+2, ... Now, if you use the equivalent of `os:timestamp()` (Pre
> 18) in the 60 and 00 seconds, then you may get the wrong order. Say you
> call os timestamp twice:
> TS1 = os:timestamp(),
> ...
> TS2 = os:timestamp(),
> You expect TS1 < TS2, but when time repeats, you may get fractions of a
> second and suddenly this invariant breaks. For example if TS1 = {X, Y,
> 700000} and TS2 = {X, Y, 200000}. It leads to all kinds of trouble if you
> rely on the time ordering in your system, and such errors can sometimes
> cascade through subsystems creating an avalanche of failure ultimately
> bringing the system down.
> In Pre 18 systems, erlang:now() performs what is called "smearing"
> nowadays, so it is guaranteed to be monotonic and unique. This means the
> above repetition problem doesn't happen. From 18.x and onwards, the new
> Time API gives you far more insight in what happens to the time in the
> system, so you are able to program your own solution, which is correct for
> your problem. Also note that Google altered their NTPd's to perform
> smearing systemwide for clusters where they knew it was not a problem.
> The other problem is precision. Some NTP daemons can't cope with leap
> seconds, so when one happens, they are "kicked" and loses time precision.
> Smearing also alters the clock speed, so 1000ms could suddenly be 1010ms or
> 1001ms in your system. For some systems, where high-resolution timing is a
> necessity, this is trouble. Air Traffic control needs high precision
> because planes move 300m in 1 second. The same is true for high speed
> trains. Manufacturing plants some times needs high precision time keeping
> because of the work they do. Systems can suddenly be off from each other by
> up to a second, and this can end up in disaster.
> Erlang/OTP 18.x decouples monotonic time from the system time. This means
> you can use monotonic time when you need the high resolution/precision
> timing. Using time for event ordering is usually a programming mistake
> because leap seconds violate the invariant that time is always moving
> forwards. Also, there are subtle bugs to look out for: one, distributed
> systems will never be able to use time as a resolver for ordering, unless
> it is known what drift there are on the clocks. Google's Spanner system
> employs GPS clocks in the data centers to make sure time is accurate. And
> then they can make guarantees about a time window across data centers. The
> other bug is related to what Justin Sheehy so succinctly wrote in the
> sentence, "There is no now". Imagine the following code:
> TS = os:timestamp(),
> <Exprs>
> f(..., TS, ...)
> If Erlang preempts you in <Exprs>, or the kernel does the same, then in
> principle, any amount of time might happen between the def of TS and it's
> use inside 'f'. That is, any timestamp you draw always "lags behind"
> reality by some value ε, and in some cases this ε varies by quite a lot in
> a non-deterministic way. If two Erlang processes both draw a timestamp in
> this fashion and only one of them gets blocked, then the event ordering
> might be inverted from what you expect. Coping with this is a real problem.
> In cooperatively scheduled systems, Node.js for instance, this is less of a
> problem because a task always runs to completion, even after a kernel
> preemption. But in a fully preemptive system, like Erlang, Go or
> Haskell[0], this is something to look out for.
> [0] Go preempts on a function call boundary nowadays. So it won't be
> preempted in a for (;;) { ... } loop. Haskell preempts on memory
> allocation. In practice, both models are fully preemptive. Erlang also
> preempts on a funcall boundary, but its functional nature means that the
> end of any basic block of execution has a function call.
> --
> J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150514/dbe47e5d/attachment.htm>

More information about the erlang-questions mailing list