[erlang-questions] Must and May convention

Thu Sep 28 14:42:33 CEST 2017

On 09/28, Joe Armstrong wrote:
>
>I actually quite liked the Elixir pipe operator - so I changed the
>Erlang parser and messed with this a bit.
>
> ...
>
>I can't make my mind up about this - the version with pipes is slightly longer
>but seems very readable to me. Trouble is, if it crashes due to a non existent
>file, I can't see which file it is ...
>

A distinction between a function signature that returns `{ok, _} | 
{error, _}' or one that raises exceptions is in the expectation of how 
recoverable an error should be to the caller.

First let's look at our 3 exception types:

1. throws: should be used for non-local returns. I argue that throws 
should never escape the module they were declared into without very 
explicit documentation, and they should be mostly forgettable to your 
callers.

2. errors: a condition prevents the code from doing its work, and the 
expectation is that a programmer should modify code for things to keep 
going. Those are calls you can catch and analyze, but aside from 
situations where you're actively trying to work around limitations in 
library code (or you're writing tests/logging code), you should want to 
avoid that kind of catching.

3. exits: the process should stop running.

This sheds some light, IMO, in when to use `{ok, _} | {error, _}' or 
just raising an exception: if the condition is something you could 
foresee as reasonable as a library writer and for which corrective 
action is plausible to undertake at runtime by the caller, then wrapping 
results in `{ok, _} | {error, _}' makes sense.

The wrapping of arguments lets the caller make their own decisions 
regarding the acceptability of a condition. If they think it's fair to 
expect various values, a `case ... of` may be used; they can otherwise 
just do a strict match and fail, elevating the unexpected return value 
to the exception level (`error') on their own.

The interesting variation within your two code sample comes from the 
handling, whether implicit or not, of these conditions:

    test2(F) ->
        F  |> file:read_file()
           |> ok()
           |> tokenize()
           |> parse().

    test2(F) ->
        {ok, B} = file:read_file(),
        T = tokenize(B),
        parse(T).

     ok({ok,T}) -> T.

The interesting thing that happens in the later case is that the 
assertion that `B` must be there is very explicit and parseable 
visually. In the piped case, you must manually transform the 
unrecoverable condition into an exception through an explicit check.  
There's no big difference between that and:

    test2(F) ->
        B = ok(file:read_file()),
        T = tokenize(B),
        parse(T).

The awkwardness of either solution, I think, comes from the fact that 
you're taking composition and using it to handle control flow. Taken to 
a bigger extreme, you could imagine:

    check_a(...) -> {ok, X};
    check_a(...) -> {error, E}.

    check_b(...) -> {ok, X};
    check_b(...) -> {error, E}.
    ...
    check_z(...) -> {ok, X};
    check_z(...) -> {error, E}.

Handling those with a pipe would remain very awkward:

    testN(F) ->
        F |> check_a()
          |> ok()
          |> check_b()
          |> ok()
          |> ...
          |> check_z()
          |> ok()
          |> do_something().

Clearly, the pipe is not the right tool for the conditional job. That's 
a bit where the maybe monad and similar tools come to help. Let's define 
a new pipe for the sake of the argument: ||> will either unpack the `X' 
from `{ok,X}' and pass it on, or exit as soon as possible:

    testN(F) ->
        F ||> check_a()
          ||> check_b()
          ...
          ||> check_z(),
          ||> do_something().

Now it tends to work pretty well. The weakness though is that you may 
have a case where *some* conditions can be handled and some can't, or 
where only *some* conditions need asserting and in other cases you don't 
need them. How workable would it be to have a thing like:

    test3(F) ->
        F ||> file:read_file()
           |> tokenize()
           |> parse().

Where both operators compose within the same flow and fairly 
transparently? You of course still lose the property of knowing which 
operation failed and caused a bail out, but that is possibly more a 
weakness of using a pure opaque flow to handle things.

I tried to play a bit with that in fancyflow[1] which failed a bit 
because I didn't want to do anything but use parse transforms and it 
doesn't look nearly as pretty:

    test4(F) ->
        [pipe](F,
               [maybe](_, file:read_file(_)),
               tokenize(_),
               parse(_)).

It does, however, have the advantage of being able to use arbitrary 
argument positions and even repeat the argument in multiple places at 
the call site.

I'm not arguing fancyflow should be a thing people use in their every 
day life, but it proved an interesting experimentation device. For 
example, I also managed to add

    test5() ->
        [F1,F2|T] = [parallel](f1(),
                               f2(),
                               f3(),
                               2+5).

under a similar form. As opposed to using a given operator, the verbose 
format allows very clear composability rules (you can make parallel 
validations of pipes using 'maybe's without confusion), and can be 
extended for all kinds of operations.

The interesting aspect of it is the ability to (as with monads) define a 
specific execution context around various expressions telling you which 
transformations should be applied between each of them.

Therefore, nothing would really prevent us from doing something like:

    testN(F) ->
        [verbose_maybe](F,
                        check_a(_),
                        check_b(_),
                        ...
                        check_z(_),
                        do_something(_)).

Where rather than returning `{ok,X} | {error,X}' with `{error,X}' being 
the literal return value of any of the check function, it instead 
returns something like `{error, {verbose_maybe, Line, "check_b(_)"}, X}' 
(or whatever other format could be judged useful), allowing to keep 
local information about the workflow.

Or why not just turn to exceptions?

    testN(F) ->
        [ok](F,
             check_a(_),
             check_b(_),
             ...
             check_z(_),
             do_something(_)).

This format could, for example, just apply your 'ok/1' function 
in-between any function call listed there, yieding full blown exceptions 
every time something is off.

The problem, is, of course, the cost of letting someone write such 
abstractions. With monads in a language like Haskell, it is extremely 
cheap; syntax never changes, only the context definitions.
With parse transforms in Erlang like in fancyflow, it's easy to read new 
forms, but a pain in the ass to extend.  With custom operators in a 
strict language like Erlang, it's years of work.
Even then though custom operators in a macro-friendly language like 
Elixir (or a language friendly to custom operators like Scala) are easy 
to add, it then comes with a huge cognitive cost to the community since 
anyone can reinvent them and they're always shitty to decipher.

It's fun to think about though!

Regards,
Fred.

[1]: https://github.com/ferd/fancyflow