<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 13, 2015 at 3:39 PM, Youngkin, Rich <span dir="ltr"><<a href="mailto:richard.youngkin@pearson.com" target="_blank">richard.youngkin@pearson.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>here's an upcoming leap second on <span class="" tabindex="0"><span class="">June 30th</span></span>. There's a bit of buzz about how it affects Linux and Java, as well as problems encountered in 2012 [1].</div></blockquote></div><br>In addition to what Rickard wrote:</div><div class="gmail_extra"><br></div><div class="gmail_extra">The two major problems to look out for is repetition and precision. Repetition happens because the POSIX clock doesn't understand leap seconds, so it repeats a second. If UTC is 58, 59, 60, 00, ... then POSIX will be N, N+1, N+2, N+2, ... Now, if you use the equivalent of `os:timestamp()` (Pre 18) in the 60 and 00 seconds, then you may get the wrong order. Say you call os timestamp twice:</div><div class="gmail_extra"><br></div><div class="gmail_extra">TS1 = os:timestamp(),</div><div class="gmail_extra">...</div><div class="gmail_extra">TS2 = os:timestamp(),</div><div class="gmail_extra"><br></div><div class="gmail_extra">You expect TS1 < TS2, but when time repeats, you may get fractions of a second and suddenly this invariant breaks. For example if TS1 = {X, Y, 700000} and TS2 = {X, Y, 200000}. It leads to all kinds of trouble if you rely on the time ordering in your system, and such errors can sometimes cascade through subsystems creating an avalanche of failure ultimately bringing the system down.</div><div class="gmail_extra"><br></div><div class="gmail_extra">In Pre 18 systems, erlang:now() performs what is called "smearing" nowadays, so it is guaranteed to be monotonic and unique. This means the above repetition problem doesn't happen. From 18.x and onwards, the new Time API gives you far more insight in what happens to the time in the system, so you are able to program your own solution, which is correct for your problem. Also note that Google altered their NTPd's to perform smearing systemwide for clusters where they knew it was not a problem.</div><div class="gmail_extra"><br></div><div class="gmail_extra">The other problem is precision. Some NTP daemons can't cope with leap seconds, so when one happens, they are "kicked" and loses time precision. Smearing also alters the clock speed, so 1000ms could suddenly be 1010ms or 1001ms in your system. For some systems, where high-resolution timing is a necessity, this is trouble. Air Traffic control needs high precision because planes move 300m in 1 second. The same is true for high speed trains. Manufacturing plants some times needs high precision time keeping because of the work they do. Systems can suddenly be off from each other by up to a second, and this can end up in disaster.</div><div class="gmail_extra"><br clear="all"><div>Erlang/OTP 18.x decouples monotonic time from the system time. This means you can use monotonic time when you need the high resolution/precision timing. Using time for event ordering is usually a programming mistake because leap seconds violate the invariant that time is always moving forwards. Also, there are subtle bugs to look out for: one, distributed systems will never be able to use time as a resolver for ordering, unless it is known what drift there are on the clocks. Google's Spanner system employs GPS clocks in the data centers to make sure time is accurate. And then they can make guarantees about a time window across data centers. The other bug is related to what Justin Sheehy so succinctly wrote in the sentence, "There is no now". Imagine the following code:</div><div><br></div><div>TS = os:timestamp(),</div><div><Exprs></div><div>f(..., TS, ...)</div><div><br></div><div>If Erlang preempts you in <Exprs>, or the kernel does the same, then in principle, any amount of time might happen between the def of TS and it's use inside 'f'. That is, any timestamp you draw always "lags behind" reality by some value ε, and in some cases this ε varies by quite a lot in a non-deterministic way. If two Erlang processes both draw a timestamp in this fashion and only one of them gets blocked, then the event ordering might be inverted from what you expect. Coping with this is a real problem. In cooperatively scheduled systems, Node.js for instance, this is less of a problem because a task always runs to completion, even after a kernel preemption. But in a fully preemptive system, like Erlang, Go or Haskell[0], this is something to look out for.</div><div><br></div><div>[0] Go preempts on a function call boundary nowadays. So it won't be preempted in a for (;;) { ... } loop. Haskell preempts on memory allocation. In practice, both models are fully preemptive. Erlang also preempts on a funcall boundary, but its functional nature means that the end of any basic block of execution has a function call.</div><div><br></div><div><br></div>-- <br><div class="gmail_signature">J.</div>
</div></div>