[erlang-questions] BEAM micro-optimization: term comparison (> 0) versus pattern-matching

Wed Apr 19 05:56:27 CEST 2017

> On 16/04/2017, at 6:13 AM, Mikael Pettersson <mikpelinux@REDACTED> wrote:
> 
> Suppose a frequently called function returns a positive integer (always a fixnum)
> and a boolean "success" or "failure" indicator (even in the failure case the return
> value is significant and will be used).  Which is better from a performance
> perspective:

Measure it.  I will be astonished if it makes any practical difference.
You are probably likely to get more benefit from inlining, or caching,
or somehow reducing the number of calls.
> 
> 1. Tag the return value, {success, X} or {failure, Y}, and have callers pattern-match
>   on that
> 
> or,
> 
> 2. Indicate failure by negating Y, and have callers match on
> 
>      X when X > 0 -> success case;
>      MinusY -> failure case % MinusY =< 0 is implicit

You have omitted option 3:

   frequently_called(Inputs, Success, Failure) ->
      ... success(Fixnum) ...
    ;
      ... failure(Other_Fixnum) ...

And option 4:

  - Have the success case(s) return the usual number.
  - Have the failure case(s) raise an exception containing the other number.

And option 5:
  - Have the success case return the usual number.
  - Have the failure case return {error,X} or {X} or something else

And options 6 to 8 that I haven't thought of yet...

> 
> Option 2 avoids the consing overheads of option 1, but I'm worried that the X > 0
> guard may be less optimized than a plain structural pattern-match.  The last time
> I checked, term-comparisons would check inline for identity, and otherwise call
> the general cmp() function which would then do a lot of type-casing.

I am much more worried about the possibility of things going wrong.
I'm worried about returning zero or the wrong sign.
I'm worried about making it harder to get the most out of the Dialyzer.

Note that what is optimised and how well is something that *changes* over time.

Just to give you an idea of what's possible here, an optimisation I looked into
(for a different language) once goes like this:

If there are many calls to f(...) in contexts like
case f(...) of {t1,...} -> ... | {tn,...} -> ... end
then make two copies of f.  In one of the copies,
... {tk,E1,...,Ex}
turns into R1,...,Rx := E1,...,Ex; return k
and the case turns into
    switch (f(...)) { ... case k: /* data are in R1,...,Rx */ ... }
This also applies to
   {tk,X1,...,Xx} = f(...)
which is a degenerate 'case',
Use the other copy when the call is not in such a context.
This was inspired by the compiler for Janus, IIRC and *partly*
motivated by multiple value returns in Common Lisp and Scheme.

I'm not saying that this will ever be done by an Erlang compiler,
but it *could* be done by an Erlang compiler, and it would make the
tuple version the cheapest available.

> In my pure Erlang code I'm using option 1, but I'm reimplementing the function
> as a NIF, and would like to avoid the consing overheads is that's indeed better.

I presume that under "consing overheads" you are including the cost of
cleaning up the dead tuples in garbage collection.

But seriously, there is no substitute for measurement.
If the function is costly enough to be worth reimplementing as a NIF,
this aspect of it is likely to be quite unimportant, but only
measurement can tell you for sure.

You have given us no idea what this function does.
Have you profiled the arguments it is given to see if it is often
called with similar enough arguments to make some kind of caching
attractive?