[erlang-questions] Lockfree now()

Thu Jul 12 14:58:31 CEST 2012

When I implemented now() in Erjang, I found a lock-free algorithm for it
which has the right "timestamp must be monotonically increasing" property.
I did wonder at the time whether it would make sense to suggest using it in
BEAM as well. I didn't (although I was proud of the simple solution I'd
discovered); mostly because the corresponding BEAM source, with all its
smoothing algorithms and support for different platforms, was perhaps a bit
intimidating.
Now, triggered by the mention on this list that the mutex used in now() is
actually being a bottleneck (and having spent the past couple of months
ruthlessly refactoring other people's organically grown code), I dare
suggest it :-)

The algorithm could be used as-is to at least reduce the extent of the
timeofday section in get_now(), if used for adjusting for that property
after a timestamp has been obtained.
(This is assuming that we have true atomic operations of at least 64-bit
width; this assumption may of course fail on some platforms. Among the
supported (or interesting) platforms, do any have SMP but not such atomic
operations?)
It would be even better to eliminate the locking in (the common code path
of) get_now() altogether, but there are quite a few variants for obtaining
the timestamp (3 implementations of get_tolerant_timeofday()).

The timer reading and smoothing code seems to have complexity of the "don't
change unless you're really sure what you're doing" kind; bearing that in
mind, here's what I could imagine for locking-reducing improvements:

0) The code as it is now has two parts: a "read timer, adjust timer value
according to state, and adjust state" part, and a "ensure monotonicity"
part.
1) Split the former into a lock-less "read+adjust" part, and a locked
"adjust state" part which is done seldomly. This can be done if the state
for adjusting can be read atomically.
2) The main algorithm for now() is then:
// Warning - pseudo-code ahead - I hope it's not too detailed for the
purpose
{
  read = adjusted_timer_value();
  if (time_to_adjust_timer_state(read)) {
    lock()
    adjust_timer_state(read);
    unlock()
    time = adjusted_timer_value() // Re-read
  } else time = read;
  return ensure_monotonically_increasing_now(time)
}

3) Here, time_to_adjust_timer_state() is also lock-free, taking advantage
of the fact that it doesn't matter if two threads get a "yes" answer at the
same time; if nothing else, the condition can be re-tested in
adjust_timer_state().

static erts_smp_atomic_t  adjust_time;
bool time_to_adjust_timer_state(long read_time) {
 return erts_smp_atomic_read(adjust_time) - read_time > ADJUST_INTERVAL;
}

adjust_time is then updated at the end of adjust_timer_state().

4) Finally, the ensure_monotonically_increasing_now() simply implements the
algorithm I discovered for Erjang; here it is in the current Java
expression:
{ // parameter is named 'micros'
                 long prev;
                while ((prev = latest_now.get()) < micros) {
                        if (latest_now.compareAndSet(prev,micros)) {
                                return micros;
                        }
                }
                return latest_now.incrementAndGet();
}

So, there it is: ensure that the state needed for the timer correction is
readable in a lock-free manner; that it is updated within mutex protection
but seldomly; and that timer reading and adjustment itself does not change
the global state. Then the common path can use just atomic ops, not locks.
The Erjang implementation of now() is quite a bit faster than that of BEAM.
It doesn't have the same smoothing logic, of course, but I like to believe
that the lack of locking is the major reason.

Would the proposed change be worth it, do you think?

2012/7/12 Wei Cao <cyg.cao@REDACTED>
[snip]

Finally, I use lcnt to locate all lock collisions in erlang VM, found
> the mutex timeofday is the bottleneck.
>
>                                                           lock
>                 location  #tries  #collisions  collisions [%]  time
> [us]  duration [%]
>
>     -----                        --------- ------- ------------
> --------------- ---------- -------------
>
> timeofday        'beam/erl_time_sup.c':939  895234       551957
>  61.6551    3185159       23.5296
>
> timeofday        'beam/erl_time_sup.c':971  408006       264498
>  64.8270    1473816       10.8874
>
>
> the mutex timeofday is locked each time erts_check_io is invoked to
> "sync the machine's idea of time", erts_check_io is executed hundreds
> of thounds of times per second, so it's locked too much times, hence
> reduce performance.
>
> I solved this problem by moving the sync operation into a standalone
> thread, invoked 1 time per millisecond
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120712/11baea22/attachment.htm>