[erlang-questions] Does erlang:now() guarantee that subsequent calls to this BIF returns continuously increasing values even in erlang cluster?
Fred Hebert
mononcqc@REDACTED
Tue Apr 21 14:20:18 CEST 2015
On 04/21, Michael Turner wrote:
>"Lamport/vector clocks and other similar ones operate on *causality*, but
>this partial ordering is not the only one available or workable."
>
>Whether it's "workable" depends on what's desired. Sorting by {Node,
>Timestamp} is not accurate if causality matters and clocks have drifted out
>of synch. As they will. Hence Lamport's work, and the work of others. And
>if causality doesn't matter, well, I wonder: why bother? Unless you just
>want a rough idea of when certain things happened, in which case {Node,
>Timestamp} can give you a /total/ order that's, if anything, more accurate
>than what you need.
>
That's not necessarily true. Let's see for different options and when
they can be useful.
- Lamport/vector clocks: causality. I wan to track the logical
dependencies of changes.
- `{Node, Timestamp}`: I have lots of local events (say HTTP requests
and responses in logs) and want to see *when* they happen and how far
apart. The timestmap might need to be monotonic, but the per-node
value lets me impose a logical order, track some density over time
(assuming I at least have NTP working), and so on.
- {Shard, Timestamp}: I require a total order, but for events within a
sharded data set.
- {Cluster, Timestamp}: Each cluster I run might belong to specific
customers or whatever, or run a specific set of hardware, or be a
logical division. In any case, it's possible they have their own time
or id service and I may want a partial or total order based on the
events within that cluster, without worrying I might want to compare
cross-cluster activity.
- {Region, Timestamp}: Similar to the above, but by geographical area. I
might decide that I need a total order on some form of transactions
and will run a service, but for latency (and if real world allows it),
I won't try to synchronize my time across data-centers or large
geographical areas.
All of these 'labelled timestamps' *are* a partial order. They only
define it on some label. I.e. you can sort all timestamps within a
node/shard/cluster/region, but can't do it across boundaries.
There are other avenues that even combine some of them; One interesting
case is inspired by Google's Chubby and CRDTs: You use a timestamp
synchronized by NTP, guaranteeing you a maximal drift interval. You then
add in a lamport clock whenever two events happen within too close of an
interval that we cannot guarantee from the system clocks they truly
happened apart.
The lamport clock is mergeable in a deterministic way that is also
commutative and idempotent (that's a CRDT!), and acts as a tie-breaker
between events that happen at too close together.
This way you get reliable timestamps when you can, and when you suddenly
can't, you get a form of causality (or global monotonicity) to break
things up.
slapping "lamport clock" on it is reductive. It's a good way to track
some levels of causality, but has its limitations. If you only *need*
node-local accuracy and you have access to a monotonic clock, it might
be far less work to just slap the monotonic clock into things than weave
the logical clock through everything, and obtain the same logical result
in the end (plus more information). Maybe it's not the best solution
either.
But really, if we want to make good recommendations, we have to ask what
the user needs. Not come with a pet solution to push through.
More information about the erlang-questions
mailing list