New EEP draft: Pinning operator ^ in patterns

Torben Hoffmann torben.lehoff@REDACTED
Fri Jan 22 09:37:32 CET 2021


These examples show very clearly (IMO) that in practice matching with an
already bound variable is open to very subtle bugs.

For my own code it is less of an issue since I strive to write small
functions, but in a lot of code functions are really long - that just
happens.

I'll repeat myself, but the fact that pinning allows you to be very precise
about the _intention_ of your match is a good thing.
As late as yesterday I did a refactoring where a ^ would have made my
intentional re-use of an already bound variable in a function head easier
to read.

Adding more guards in a Haskell style does not provide the same sort of
easy communication of intention wrt a match.
Warnings going in that direction would be something I'd always turn off -
that is simply a bad idea IMO.

I was initially on the fence about this, but a better way of showing
intention and all the real-life examples has convinced me that I am in
favour of this feature.

I don't care if ^ is used or some other symbol. I just want the clearer
intention.

Cheers,
Torben

On Thu, 21 Jan 2021 at 15:41, Richard Carlsson <carlsson.richard@REDACTED>
wrote:

> My own favourite quote by ROK is "an example would be useful about now",
> so in the spirit of that, I converted the OTP codebase to use ^-annotations
> everywhere, to get something concrete. This took me all of one afternoon: I
> rebuilt OTP from scratch with the warnings enabled, collected the warnings
> and converted them into a naive sed script that did the edits. Then I had
> to do some manual cleanup for those cases where my script had been too
> naive. Today I also took care of converting the test suites, which I missed
> in the first pass. (The test suites added about 2000 further uses of
> already-bound variables.)
>
> To look at the diff, you can browse
> https://github.com/richcarl/otp/commits/pinning-otp
> or if you prefer, fetch it to your otp repo:
>
>         git fetch https://github.com/richcarl/otp.git pinning-otp
>         git log FETCH_HEAD
>         git show FETCH_HEAD^       (for the normal modules)
>         git show FETCH_HEAD         (for the testsuites)
>
> Note though that I did this as quickly as possible just to make it pass
> the build. To do a conversion like this for real, the annotated code should
> at least be superficially inspected by someone who knows it, because I
> think some of the existing uses are wrong or at least not quite what you
> want to do, and this would be your main chance to find and correct them.
>
> On the whole, I find the annotated code so much more readable (in those
> places where already-bound variables are used), that it's not even funny
> when I think about all the time lost over the years to staring at the code
> and trying to see which variables are already bound and how that affects
> the control flow. With ^-annotations, these uses stick out very clearly
> even when quickly scanning the code in less or git diff.
>
> Here follow some observations I've made from just looking at this diff for
> things that stand out as odd:
>
> --------------------------------------------------
> Multiple pinned variables on same line - immediately visible what's
> happening
>
>     receive
>         {'DOWN', ^Ref, process, ^Proc, _Info} ->
>             badarg;
>
>     receive
>         {ssh_cm, ^SSH, {open, ^Chn, RemoteChn, {session}}} ->
>
> The alternative way with temporary variables and guards are much harder to
> follow:
>
>     receive
>         {'DOWN', Ref1, process, Proc1, _Info} when Ref =:= Ref1, Proc =:=
> Proc1->
>             badarg;
>
>     receive
>         {ssh_cm, SSH1, {open, Chn1, RemoteChn, {session}}} when SSH1 =:=
> SSH, Chn1 =:= Chn->
>
>
> --------------------------------------------------
> Some comments existed mainly because the use was not obvious:
>
>     #{Start:=FromPos} = SPos,               %Assertion.
>
>     {Name, N, Reply} ->  %% Name is bound !!!
>
> Annotations are checkable comments:
>
>     #{Start:=^FromPos} = SPos,
>
>     {^Name, ^N, Reply} ->
>
>
> --------------------------------------------------
> Some possible bugs immediately stand out when ^-annotated:
>
>     [L1,_L2|^Rest] when is_list(L1) -> ...
>
> The rest of the list is alreay bound? Seems fishy.
>
>
>     case lists:keyfind(AppName, 1, StopRunning) of
>         {^_AppName, ^Id} -> ...
>
> An underscore-prefixed variable which is already bound? Probably working
> by luck, not by intention.
>
>
>     parse_top(Line0, DecodeOpts, D) ->
>         {Label,Line1} = get_label(Line0),
>         {Term,Line,^D} = parse_term(Line1, DecodeOpts, D),
>
> Did the author really expect parse_term to return the same D here?
>
>
>
> ^E0=processed_whole_element(S,Pos,Name,Attrs1,Lang,Parents,NSI,Namespace),
>
> This whole line turned out to occur twice in the function body, but gets
> the same
> result both times, thankfully.
>
>
> --------------------------------------------------
> Some weird code becomes obvious when annotated.
>
> What does this line do?
>
>     _ = [M = M:module_info(module) || M <- Needed],
>
> Oh, it's a multi-assertion!
>
>     _ = [^M = M:module_info(module) || M <- Needed],
>
>
> --------------------------------------------------
> Some uses indicate that you should probably have written this in a less
> cute way:
>
>     {M, ^F, A} = MFA = {cerl:atom_val(cerl:call_module(Guard)), F,
> length(Args)},
>
> is easier to read and maintain as:
>
>     M = cerl:atom_val(cerl:call_module(Guard)),
>     A = length(Args),
>     MFA = {M, F, A},
>
>
> --------------------------------------------------
> Some cases are just very unclear if they are intentional or not, until you
> have a full understanding of what the code does:
>
>     select_bin_seg(#k_val_clause{val=#k_bin_int{size=Sz,unit=U,flags=Fs,
>                                                 val=Val,next=Next},
>                                  body=B},
>                    #k_var{}=Src, Fail, St0) ->
>         Ctx = get_context(Src, St0),
>         {Mis,St1} = select_extract_int(Next, Val, Sz, U, Fs, Fail,
>                                          Ctx, St0),
>         {Bis,St} = match_cg(B, Fail, St1),
>         Is = case Mis ++ Bis of
>
>  [#b_set{op=bs_match,args=[#b_literal{val=string},OtherCtx1,Bin1]},
>                   #b_set{op={succeeded,guard},dst=Bool1},
>                   #b_br{bool=Bool1,succ=Succ,fail=Fail},
>                   {label,Succ},
>
> #b_set{op=bs_match,dst=Dst,args=[#b_literal{val=string},_OtherCtx2,Bin2]}|
>                   [#b_set{op={succeeded,guard},dst=Bool2},
>                    #b_br{bool=Bool2,fail=Fail}|_]=Is0] ->
>                   ...
>
> Is the use of Fail in the patterns above intentional and important?
> There are no comments to guide you.
>
> In the following, is the duplicated call to ssa_args/2 (with identical
> arguments) and the match on the already bound As with the (hopefully same)
> result of the second call intentional?
>
>     test_cg(Test, Inverted, As0, Fail, St0) ->
>         As = ssa_args(As0, St0),
>         case {Test,ssa_args(As0, St0)} of
>
> {is_record,[Tuple,#b_literal{val=Atom}=Tag,#b_literal{val=Int}=Arity]}
>               when is_atom(Atom), is_integer(Int) ->
>                 false = Inverted,                   %Assertion.
>                 test_is_record_cg(Fail, Tuple, Tag, Arity, St0);
>             {_,As} ->
>                 {Bool,St1} = new_ssa_var('@ssa_bool', St0),
>                 ...
>         end.
>
> In the below, is the assertion intentional, or should it have been a new
> variable "Plt1 = InitState#st.plt"?
>
>     get_warnings(Callgraph, Plt, DocPlt, Codeserver,
>                  TimingServer, Solvers, Parent) ->
>         InitState =
>           init_state_and_get_success_typings(Callgraph, Plt, Codeserver,
>                                              TimingServer, Solvers,
> Parent),
>         Mods = dialyzer_callgraph:modules(InitState#st.callgraph),
>         Plt = InitState#st.plt,
>         CWarns =
>           dialyzer_contracts:get_invalid_contract_warnings(Mods,
> Codeserver, Plt),
>         ...
>
> Can you spot the assertion in the below?
>
>     do_init_trans_id_counter(ConnHandle, Item, Incr) ->
>         case megaco_config:lookup_local_conn(ConnHandle) of
>             [] ->
>                 {error, {no_such_connection, ConnHandle}};
>             [ConnData] ->
>                 %% Make sure that the counter still does not exist
>                 LocalMid = ConnHandle#megaco_conn_handle.local_mid,
>                 Min      = user_info(LocalMid, min_trans_id),
>                 Max      =
>                     case ConnData#conn_data.max_serial of
>                         infinity ->
>                             4294967295;
>                         MS ->
>                             MS
>                     end,
>                 Item     = ?TID_CNT(LocalMid),
>                 Incr2    = {2, Incr, Max, Min},
>                 case (catch ets:update_counter(megaco_config, Item,
> Incr2)) of
>                 ...
>
> Is the assertion below intentional, or a left-over from some old
> refactoring?
> (There are no other calls to write_to_store()).
>
>                 ...
>                 Oid = {Tab, element(2, Val)},
>                 case LockKind of
>                     write ->
>                         mnesia_locker:wlock(Tid, Store, Oid);
>                     sticky_write ->
>                         mnesia_locker:sticky_wlock(Tid, Store, Oid);
>                     _ ->
>                         abort({bad_type, Tab, LockKind})
>                 end,
>                 write_to_store(Tab, Store, Oid, Val);
>             ...
>
>     write_to_store(Tab, Store, Oid, Val) ->
>         {_, _, Type} = mnesia_lib:validate_record(Tab, Val),
>         Oid = {Tab, element(2, Val)},
>         ...
>
> This double file header check is probably intentional, but a comment would
> have helped. With a ^-annotation on the second H0, the intent would have
> been clear:
>
>             try dets_v9:check_file_header(FH, Fd) of
>                 {ok, H0} ->
>                     case dets_v9:check_file_header(FH, Fd) of
>                         {ok, H0} ->
>
>
>
>         /Richard
>
>
> Den fre 15 jan. 2021 kl 13:34 skrev Richard Carlsson <
> carlsson.richard@REDACTED>:
>
>> There have been many strong reactions in this thread, so let me give you
>> some statistics to show how much this feature of using bound variables is
>> actually used in practice. I checked the entire OTP codebase: there are
>> just over 1300 modules, and in total about 595000 variable occurrences in
>> patterns, of which only 3350 are already bound.. That makes 0.56% of all
>> variables in patterns - about once in 200 to make it simple. On average,
>> that's 2-3 usages per module - some modules using it more and some not
>> using it at all.
>>
>> I find it hard to see, then, why it should be a big issue to ask
>> programmers to annotate these few occurrences for readability and
>> maintainability. It's certainly not as big of a change as for example when
>> the warning for unused variables, unless prefixed with _, was made the
>> default.
>>
>> Imagine a world where Erlang had not allowed already-bound variables in
>> patterns (forcing you to use the idiom "X1 when X1 =:= X -> ...", as in
>> e.g. Haskell), and that someone now came with the suggestion that to make
>> things simpler, we could just implicitly match on the value of X if X is
>> already bound. The old me from my university days would probably have said
>> "that's really elegant, let's do it". But the
>> maintainability-and-readability me, with experience of very large code
>> bases, large numbers of developers, and many relative newcomers to the
>> language, would say "aw hell no". This is a cute feature, but it carries a
>> large cognitive cost and is not worth having compared to how relatively
>> little it is used. Being explicit about intention is much more important.
>>
>>         /Richard
>>
>>
>> Den tors 24 dec. 2020 kl 21:10 skrev Richard Carlsson <
>> carlsson.richard@REDACTED>:
>>
>>> The ^ operator allows you to annotate already-bound pattern variables as
>>> ^X, like in Elixir. This is less error prone when code is being refactored
>>> and moved around so that variables previously new in a pattern may become
>>> bound, or vice versa, and makes it easier for the reader to see the intent
>>> of the code.
>>>
>>> See also https://github.com/erlang/otp/pull/2951
>>>
>>> Ho ho ho,
>>>
>>>         /Richard & the good folks at WhatsApp
>>>
>>

-- 
http://www.linkedin.com/in/torbenhoffmann
@LeHoff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210122/53fa0234/attachment.htm>


More information about the erlang-questions mailing list