New EEP draft: Pinning operator ^ in patterns

Thu Jan 21 17:43:49 CET 2021

I've tried to just say my piece and leave it at that (summary: not a fan but would be nice to have a way to match in fun heads/list comprehensions where variables are shadowed) but have to say that after seeing these examples I'm starting to lean towards agreeing with Richard.

On Thu, Jan 21, 2021, at 07:41, Richard Carlsson wrote:
> My own favourite quote by ROK is "an example would be useful about now", so in the spirit of that, I converted the OTP codebase to use ^-annotations everywhere, to get something concrete. This took me all of one afternoon: I rebuilt OTP from scratch with the warnings enabled, collected the warnings and converted them into a naive sed script that did the edits. Then I had to do some manual cleanup for those cases where my script had been too naive. Today I also took care of converting the test suites, which I missed in the first pass. (The test suites added about 2000 further uses of already-bound variables.)
> 
> To look at the diff, you can browse https://github.com/richcarl/otp/commits/pinning-otp
> or if you prefer, fetch it to your otp repo:
> 
>         git fetch https://github.com/richcarl/otp.git pinning-otp
>         git log FETCH_HEAD
>         git show FETCH_HEAD^       (for the normal modules)
>         git show FETCH_HEAD         (for the testsuites)
> 
> 
> Note though that I did this as quickly as possible just to make it pass the build. To do a conversion like this for real, the annotated code should at least be superficially inspected by someone who knows it, because I think some of the existing uses are wrong or at least not quite what you want to do, and this would be your main chance to find and correct them.
> 
> On the whole, I find the annotated code so much more readable (in those places where already-bound variables are used), that it's not even funny when I think about all the time lost over the years to staring at the code and trying to see which variables are already bound and how that affects the control flow. With ^-annotations, these uses stick out very clearly even when quickly scanning the code in less or git diff.
> 
> Here follow some observations I've made from just looking at this diff for things that stand out as odd:
> 
> --------------------------------------------------
> Multiple pinned variables on same line - immediately visible what's happening
> 
>     receive
>         {'DOWN', ^Ref, process, ^Proc, _Info} ->
>             badarg;
> 
>     receive
>         {ssh_cm, ^SSH, {open, ^Chn, RemoteChn, {session}}} ->
> 
> The alternative way with temporary variables and guards are much harder to follow:
> 
>     receive
>         {'DOWN', Ref1, process, Proc1, _Info} when Ref =:= Ref1, Proc =:= Proc1->
>             badarg;
> 
>     receive
>         {ssh_cm, SSH1, {open, Chn1, RemoteChn, {session}}} when SSH1 =:= SSH, Chn1 =:= Chn->
> 
> 
> --------------------------------------------------
> Some comments existed mainly because the use was not obvious:
> 
>     #{Start:=FromPos} = SPos,               %Assertion.
> 
>     {Name, N, Reply} ->  %% Name is bound !!!
> 
> Annotations are checkable comments:
> 
>     #{Start:=^FromPos} = SPos,
> 
>     {^Name, ^N, Reply} ->
> 
> 
> --------------------------------------------------
> Some possible bugs immediately stand out when ^-annotated:
> 
>     [L1,_L2|^Rest] when is_list(L1) -> ...
> 
> The rest of the list is alreay bound? Seems fishy.
> 
> 
>     case lists:keyfind(AppName, 1, StopRunning) of
>         {^_AppName, ^Id} -> ...
> 
> An underscore-prefixed variable which is already bound? Probably working by luck, not by intention.
> 
> 
>     parse_top(Line0, DecodeOpts, D) ->
>         {Label,Line1} = get_label(Line0),
>         {Term,Line,^D} = parse_term(Line1, DecodeOpts, D),
> 
> Did the author really expect parse_term to return the same D here?
> 
> 
>     ^E0=processed_whole_element(S,Pos,Name,Attrs1,Lang,Parents,NSI,Namespace),
> 
> This whole line turned out to occur twice in the function body, but gets the same
> result both times, thankfully.
> 
> 
> --------------------------------------------------
> Some weird code becomes obvious when annotated.
> 
> What does this line do?
> 
>     _ = [M = M:module_info(module) || M <- Needed],
> 
> Oh, it's a multi-assertion!
> 
>     _ = [^M = M:module_info(module) || M <- Needed],
> 
> 
> --------------------------------------------------
> Some uses indicate that you should probably have written this in a less cute way:
> 
>     {M, ^F, A} = MFA = {cerl:atom_val(cerl:call_module(Guard)), F, length(Args)},
> 
> is easier to read and maintain as:
> 
>     M = cerl:atom_val(cerl:call_module(Guard)),
>     A = length(Args),
>     MFA = {M, F, A},
> 
> 
> --------------------------------------------------
> Some cases are just very unclear if they are intentional or not, until you
> have a full understanding of what the code does:
> 
>     select_bin_seg(#k_val_clause{val=#k_bin_int{size=Sz,unit=U,flags=Fs,
>                                                 val=Val,next=Next},
>                                  body=B},
>                    #k_var{}=Src, Fail, St0) ->
>         Ctx = get_context(Src, St0),
>         {Mis,St1} = select_extract_int(Next, Val, Sz, U, Fs, Fail,
>                                          Ctx, St0),
>         {Bis,St} = match_cg(B, Fail, St1),
>         Is = case Mis ++ Bis of
>                  [#b_set{op=bs_match,args=[#b_literal{val=string},OtherCtx1,Bin1]},
>                   #b_set{op={succeeded,guard},dst=Bool1},
>                   #b_br{bool=Bool1,succ=Succ,fail=Fail},
>                   {label,Succ},
>                   #b_set{op=bs_match,dst=Dst,args=[#b_literal{val=string},_OtherCtx2,Bin2]}|
>                   [#b_set{op={succeeded,guard},dst=Bool2},
>                    #b_br{bool=Bool2,fail=Fail}|_]=Is0] ->
>                   ...
> 
> Is the use of Fail in the patterns above intentional and important?
> There are no comments to guide you.
> 
> In the following, is the duplicated call to ssa_args/2 (with identical
> arguments) and the match on the already bound As with the (hopefully same)
> result of the second call intentional?
> 
>     test_cg(Test, Inverted, As0, Fail, St0) ->
>         As = ssa_args(As0, St0),
>         case {Test,ssa_args(As0, St0)} of
>             {is_record,[Tuple,#b_literal{val=Atom}=Tag,#b_literal{val=Int}=Arity]}
>               when is_atom(Atom), is_integer(Int) ->
>                 false = Inverted,                   %Assertion.
>                 test_is_record_cg(Fail, Tuple, Tag, Arity, St0);
>             {_,As} ->
>                 {Bool,St1} = new_ssa_var('@ssa_bool', St0),
>                 ...
>         end.
> 
> In the below, is the assertion intentional, or should it have been a new
> variable "Plt1 = InitState#st.plt"?
> 
>     get_warnings(Callgraph, Plt, DocPlt, Codeserver,
>                  TimingServer, Solvers, Parent) ->
>         InitState =
>           init_state_and_get_success_typings(Callgraph, Plt, Codeserver,
>                                              TimingServer, Solvers, Parent),
>         Mods = dialyzer_callgraph:modules(InitState#st.callgraph),
>         Plt = InitState#st.plt,
>         CWarns =
>           dialyzer_contracts:get_invalid_contract_warnings(Mods, Codeserver, Plt),
>         ...
> 
> Can you spot the assertion in the below?
> 
>     do_init_trans_id_counter(ConnHandle, Item, Incr) ->
>         case megaco_config:lookup_local_conn(ConnHandle) of
>             [] ->
>                 {error, {no_such_connection, ConnHandle}};
>             [ConnData] ->
>                 %% Make sure that the counter still does not exist
>                 LocalMid = ConnHandle#megaco_conn_handle.local_mid,
>                 Min      = user_info(LocalMid, min_trans_id),
>                 Max      =
>                     case ConnData#conn_data.max_serial of
>                         infinity ->
>                             4294967295;
>                         MS ->
>                             MS
>                     end,
>                 Item     = ?TID_CNT(LocalMid),
>                 Incr2    = {2, Incr, Max, Min},
>                 case (catch ets:update_counter(megaco_config, Item, Incr2)) of
>                 ...
> 
> Is the assertion below intentional, or a left-over from some old refactoring?
> (There are no other calls to write_to_store()).
> 
>                 ...
>                 Oid = {Tab, element(2, Val)},
>                 case LockKind of
>                     write ->
>                         mnesia_locker:wlock(Tid, Store, Oid);
>                     sticky_write ->
>                         mnesia_locker:sticky_wlock(Tid, Store, Oid);
>                     _ ->
>                         abort({bad_type, Tab, LockKind})
>                 end,
>                 write_to_store(Tab, Store, Oid, Val);
>             ...
> 
>     write_to_store(Tab, Store, Oid, Val) ->
>         {_, _, Type} = mnesia_lib:validate_record(Tab, Val),
>         Oid = {Tab, element(2, Val)},
>         ...
> 
> This double file header check is probably intentional, but a comment would
> have helped. With a ^-annotation on the second H0, the intent would have
> been clear:
> 
>             try dets_v9:check_file_header(FH, Fd) of
>                 {ok, H0} ->
>                     case dets_v9:check_file_header(FH, Fd) of
>                         {ok, H0} ->
> 
> 
> 
>         /Richard
> 
> 
> Den fre 15 jan. 2021 kl 13:34 skrev Richard Carlsson <carlsson.richard@REDACTED>:
>> There have been many strong reactions in this thread, so let me give you some statistics to show how much this feature of using bound variables is actually used in practice. I checked the entire OTP codebase: there are just over 1300 modules, and in total about 595000 variable occurrences in patterns, of which only 3350 are already bound.. That makes 0.56% of all variables in patterns - about once in 200 to make it simple. On average, that's 2-3 usages per module - some modules using it more and some not using it at all.
>> 
>> I find it hard to see, then, why it should be a big issue to ask programmers to annotate these few occurrences for readability and maintainability. It's certainly not as big of a change as for example when the warning for unused variables, unless prefixed with _, was made the default.
>> 
>> Imagine a world where Erlang had not allowed already-bound variables in patterns (forcing you to use the idiom "X1 when X1 =:= X -> ...", as in e.g. Haskell), and that someone now came with the suggestion that to make things simpler, we could just implicitly match on the value of X if X is already bound. The old me from my university days would probably have said "that's really elegant, let's do it". But the maintainability-and-readability me, with experience of very large code bases, large numbers of developers, and many relative newcomers to the language, would say "aw hell no". This is a cute feature, but it carries a large cognitive cost and is not worth having compared to how relatively little it is used. Being explicit about intention is much more important.
>> 
>>         /Richard
>> 
>> 
>> Den tors 24 dec. 2020 kl 21:10 skrev Richard Carlsson <carlsson.richard@REDACTED>:
>>> The ^ operator allows you to annotate already-bound pattern variables as ^X, like in Elixir. This is less error prone when code is being refactored and moved around so that variables previously new in a pattern may become bound, or vice versa, and makes it easier for the reader to see the intent of the code.
>>> 
>>> See also https://github.com/erlang/otp/pull/2951
>>> 
>>> Ho ho ho,
>>> 
>>>         /Richard & the good folks at WhatsApp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210121/298e2ed7/attachment-0001.htm>