New EEP draft: Pinning operator ^ in patterns

Thu Jan 21 15:41:00 CET 2021

My own favourite quote by ROK is "an example would be useful about now", so
in the spirit of that, I converted the OTP codebase to use ^-annotations
everywhere, to get something concrete. This took me all of one afternoon: I
rebuilt OTP from scratch with the warnings enabled, collected the warnings
and converted them into a naive sed script that did the edits. Then I had
to do some manual cleanup for those cases where my script had been too
naive. Today I also took care of converting the test suites, which I missed
in the first pass. (The test suites added about 2000 further uses of
already-bound variables.)

To look at the diff, you can browse
https://github.com/richcarl/otp/commits/pinning-otp
or if you prefer, fetch it to your otp repo:

        git fetch https://github.com/richcarl/otp.git pinning-otp
        git log FETCH_HEAD
        git show FETCH_HEAD^       (for the normal modules)
        git show FETCH_HEAD         (for the testsuites)

Note though that I did this as quickly as possible just to make it pass the
build. To do a conversion like this for real, the annotated code should at
least be superficially inspected by someone who knows it, because I think
some of the existing uses are wrong or at least not quite what you want to
do, and this would be your main chance to find and correct them.

On the whole, I find the annotated code so much more readable (in those
places where already-bound variables are used), that it's not even funny
when I think about all the time lost over the years to staring at the code
and trying to see which variables are already bound and how that affects
the control flow. With ^-annotations, these uses stick out very clearly
even when quickly scanning the code in less or git diff.

Here follow some observations I've made from just looking at this diff for
things that stand out as odd:

--------------------------------------------------
Multiple pinned variables on same line - immediately visible what's
happening

    receive
        {'DOWN', ^Ref, process, ^Proc, _Info} ->
            badarg;

    receive
        {ssh_cm, ^SSH, {open, ^Chn, RemoteChn, {session}}} ->

The alternative way with temporary variables and guards are much harder to
follow:

    receive
        {'DOWN', Ref1, process, Proc1, _Info} when Ref =:= Ref1, Proc =:=
Proc1->
            badarg;

    receive
        {ssh_cm, SSH1, {open, Chn1, RemoteChn, {session}}} when SSH1 =:=
SSH, Chn1 =:= Chn->

--------------------------------------------------
Some comments existed mainly because the use was not obvious:

    #{Start:=FromPos} = SPos,               %Assertion.

    {Name, N, Reply} ->  %% Name is bound !!!

Annotations are checkable comments:

    #{Start:=^FromPos} = SPos,

    {^Name, ^N, Reply} ->

--------------------------------------------------
Some possible bugs immediately stand out when ^-annotated:

    [L1,_L2|^Rest] when is_list(L1) -> ...

The rest of the list is alreay bound? Seems fishy.

    case lists:keyfind(AppName, 1, StopRunning) of
        {^_AppName, ^Id} -> ...

An underscore-prefixed variable which is already bound? Probably working by
luck, not by intention.

    parse_top(Line0, DecodeOpts, D) ->
        {Label,Line1} = get_label(Line0),
        {Term,Line,^D} = parse_term(Line1, DecodeOpts, D),

Did the author really expect parse_term to return the same D here?

^E0=processed_whole_element(S,Pos,Name,Attrs1,Lang,Parents,NSI,Namespace),

This whole line turned out to occur twice in the function body, but gets
the same
result both times, thankfully.

--------------------------------------------------
Some weird code becomes obvious when annotated.

What does this line do?

    _ = [M = M:module_info(module) || M <- Needed],

Oh, it's a multi-assertion!

    _ = [^M = M:module_info(module) || M <- Needed],

--------------------------------------------------
Some uses indicate that you should probably have written this in a less
cute way:

    {M, ^F, A} = MFA = {cerl:atom_val(cerl:call_module(Guard)), F,
length(Args)},

is easier to read and maintain as:

    M = cerl:atom_val(cerl:call_module(Guard)),
    A = length(Args),
    MFA = {M, F, A},

--------------------------------------------------
Some cases are just very unclear if they are intentional or not, until you
have a full understanding of what the code does:

    select_bin_seg(#k_val_clause{val=#k_bin_int{size=Sz,unit=U,flags=Fs,
                                                val=Val,next=Next},
                                 body=B},
                   #k_var{}=Src, Fail, St0) ->
        Ctx = get_context(Src, St0),
        {Mis,St1} = select_extract_int(Next, Val, Sz, U, Fs, Fail,
                                         Ctx, St0),
        {Bis,St} = match_cg(B, Fail, St1),
        Is = case Mis ++ Bis of

 [#b_set{op=bs_match,args=[#b_literal{val=string},OtherCtx1,Bin1]},
                  #b_set{op={succeeded,guard},dst=Bool1},
                  #b_br{bool=Bool1,succ=Succ,fail=Fail},
                  {label,Succ},

#b_set{op=bs_match,dst=Dst,args=[#b_literal{val=string},_OtherCtx2,Bin2]}|
                  [#b_set{op={succeeded,guard},dst=Bool2},
                   #b_br{bool=Bool2,fail=Fail}|_]=Is0] ->
                  ...

Is the use of Fail in the patterns above intentional and important?
There are no comments to guide you.

In the following, is the duplicated call to ssa_args/2 (with identical
arguments) and the match on the already bound As with the (hopefully same)
result of the second call intentional?

    test_cg(Test, Inverted, As0, Fail, St0) ->
        As = ssa_args(As0, St0),
        case {Test,ssa_args(As0, St0)} of

{is_record,[Tuple,#b_literal{val=Atom}=Tag,#b_literal{val=Int}=Arity]}
              when is_atom(Atom), is_integer(Int) ->
                false = Inverted,                   %Assertion.
                test_is_record_cg(Fail, Tuple, Tag, Arity, St0);
            {_,As} ->
                {Bool,St1} = new_ssa_var('@ssa_bool', St0),
                ...
        end.

In the below, is the assertion intentional, or should it have been a new
variable "Plt1 = InitState#st.plt"?

    get_warnings(Callgraph, Plt, DocPlt, Codeserver,
                 TimingServer, Solvers, Parent) ->
        InitState =
          init_state_and_get_success_typings(Callgraph, Plt, Codeserver,
                                             TimingServer, Solvers, Parent),
        Mods = dialyzer_callgraph:modules(InitState#st.callgraph),
        Plt = InitState#st.plt,
        CWarns =
          dialyzer_contracts:get_invalid_contract_warnings(Mods,
Codeserver, Plt),
        ...

Can you spot the assertion in the below?

    do_init_trans_id_counter(ConnHandle, Item, Incr) ->
        case megaco_config:lookup_local_conn(ConnHandle) of
            [] ->
                {error, {no_such_connection, ConnHandle}};
            [ConnData] ->
                %% Make sure that the counter still does not exist
                LocalMid = ConnHandle#megaco_conn_handle.local_mid,
                Min      = user_info(LocalMid, min_trans_id),
                Max      =
                    case ConnData#conn_data.max_serial of
                        infinity ->
                            4294967295;
                        MS ->
                            MS
                    end,
                Item     = ?TID_CNT(LocalMid),
                Incr2    = {2, Incr, Max, Min},
                case (catch ets:update_counter(megaco_config, Item, Incr2))
of
                ...

Is the assertion below intentional, or a left-over from some old
refactoring?
(There are no other calls to write_to_store()).

                ...
                Oid = {Tab, element(2, Val)},
                case LockKind of
                    write ->
                        mnesia_locker:wlock(Tid, Store, Oid);
                    sticky_write ->
                        mnesia_locker:sticky_wlock(Tid, Store, Oid);
                    _ ->
                        abort({bad_type, Tab, LockKind})
                end,
                write_to_store(Tab, Store, Oid, Val);
            ...

    write_to_store(Tab, Store, Oid, Val) ->
        {_, _, Type} = mnesia_lib:validate_record(Tab, Val),
        Oid = {Tab, element(2, Val)},
        ...

This double file header check is probably intentional, but a comment would
have helped. With a ^-annotation on the second H0, the intent would have
been clear:

            try dets_v9:check_file_header(FH, Fd) of
                {ok, H0} ->
                    case dets_v9:check_file_header(FH, Fd) of
                        {ok, H0} ->

        /Richard

Den fre 15 jan. 2021 kl 13:34 skrev Richard Carlsson <
carlsson.richard@REDACTED>:

> There have been many strong reactions in this thread, so let me give you
> some statistics to show how much this feature of using bound variables is
> actually used in practice. I checked the entire OTP codebase: there are
> just over 1300 modules, and in total about 595000 variable occurrences in
> patterns, of which only 3350 are already bound.. That makes 0.56% of all
> variables in patterns - about once in 200 to make it simple. On average,
> that's 2-3 usages per module - some modules using it more and some not
> using it at all.
>
> I find it hard to see, then, why it should be a big issue to ask
> programmers to annotate these few occurrences for readability and
> maintainability. It's certainly not as big of a change as for example when
> the warning for unused variables, unless prefixed with _, was made the
> default.
>
> Imagine a world where Erlang had not allowed already-bound variables in
> patterns (forcing you to use the idiom "X1 when X1 =:= X -> ...", as in
> e.g. Haskell), and that someone now came with the suggestion that to make
> things simpler, we could just implicitly match on the value of X if X is
> already bound. The old me from my university days would probably have said
> "that's really elegant, let's do it". But the
> maintainability-and-readability me, with experience of very large code
> bases, large numbers of developers, and many relative newcomers to the
> language, would say "aw hell no". This is a cute feature, but it carries a
> large cognitive cost and is not worth having compared to how relatively
> little it is used. Being explicit about intention is much more important.
>
>         /Richard
>
>
> Den tors 24 dec. 2020 kl 21:10 skrev Richard Carlsson <
> carlsson.richard@REDACTED>:
>
>> The ^ operator allows you to annotate already-bound pattern variables as
>> ^X, like in Elixir. This is less error prone when code is being refactored
>> and moved around so that variables previously new in a pattern may become
>> bound, or vice versa, and makes it easier for the reader to see the intent
>> of the code.
>>
>> See also https://github.com/erlang/otp/pull/2951
>>
>> Ho ho ho,
>>
>>         /Richard & the good folks at WhatsApp
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210121/0b0de607/attachment.htm>