[erlang-questions] seq_trace with bigint labels

Fri Jan 19 23:33:36 CET 2018

I've gone back and forth for a while now on how seq_trace, http://erlang.org/doc/man/seq_trace.html, could best be used, if at all, to benefit distributed tracing libraries.

In particular I've been working on OpenCensus (http://opencensus.io) -- a recent announcement on the Google open source blog about it: https://opensource.googleblog.com/2018/01/opencensus.html -- implementation in Erlang, https://github.com/census-instrumentation/opencensus-erlang. But this is not unique to Census, the OpenTracing (http://opentracing.io) implementation Otter, https://github.com/Bluehouse-Technology/otter/, works in a similar manner.

A trace must track a trace id and span id and propagate them to children. Both opencensus and Otter provide functionality to track the context in a variable or in the process dictionary. But when it comes to message passing the only option is  adding a variable. 

Libraries that are instrumented internally can make this not have to modify the user's api, for example in Erleans we have 'call' extract the trace context behind the scenes and pass it through the gen_statem:call, https://github.com/SpaceTime-IoT/erleans/blob/master/src/erleans_grain.erl#L142

If it were possible to carry the trace context through the message pass in the background in general this would open up possibility for easier instrumentation with few changes to a user's code.

This is essentially what seq_trace already does. The first issue hit when looking at using it to carry a census trace id (128 bit integer) is that while the docs say:

> The label component is an integer which identifies all events belonging to the same sequential trace.

'integer' here does not mean any Erlang integer:

seq_trace:set_token(label, 38995684955506843782500595084643303673).
** exception error: bad argument

Looking in the seq trace code I found it does a `is_small` check on the integer, resulting in the badarg for integers the size of a census trace id.

So finally.. the main point of this email is really just to raise the idea of increasing the size of allowable labels in seq_trace.

Otherwise we must keep track of a label -> trace mapping in a central process or ets table. The first issue that solution would have to solve is, what happens when a message goes to another node where that label mapping doesn't exist.

I'm interested to hear if this is a possibility, what other blockers might arise from using seq_trace like this, if someone has an alternative idea or if someone just thinks this is a bad idea in general :)

-- 
  Tristan Sloughter
  "I am not a crackpot" - Abe Simpson
  t@REDACTED