<div dir="ltr"><div>These examples show very clearly (IMO) that in practice matching with an already bound variable is open to very subtle bugs.</div><div><br></div><div>For my own code it is less of an issue since I strive to write small functions, but in a lot of code functions are really long - that just happens.</div><div><br></div><div>I'll repeat myself, but the fact that pinning allows you to be very precise about the _intention_ of your match is a good thing.</div><div>As late as yesterday I did a refactoring where a ^ would have made my intentional re-use of an already bound variable in a function head easier to read. <br></div><div><br></div><div>Adding more guards in a Haskell style does not provide the same sort of easy communication of intention wrt a match.</div><div>Warnings going in that direction would be something I'd always turn off - that is simply a bad idea IMO.</div><div><br></div><div>I was initially on the fence about this, but a better way of showing intention and all the real-life examples has convinced me that I am in favour of this feature.</div><div><br></div><div>I don't care if ^ is used or some other symbol. I just want the clearer intention.<br></div><div><br></div><div>Cheers,</div><div>Torben<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 21 Jan 2021 at 15:41, Richard Carlsson <<a href="mailto:carlsson.richard@gmail.com">carlsson.richard@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">My own favourite quote by ROK is "an example would be useful about now", so in the spirit of that, I converted the OTP codebase to use ^-annotations everywhere, to get something concrete. This took me all of one afternoon: I rebuilt OTP from scratch with the warnings enabled, collected the warnings and converted them into a naive sed script that did the edits. Then I had to do some manual cleanup for those cases where my script had been too naive. Today I also took care of converting the test suites, which I missed in the first pass. (The test suites added about 2000 further uses of already-bound variables.)<div><br></div><div>To look at the diff, you can browse <a href="https://github.com/richcarl/otp/commits/pinning-otp" target="_blank">https://github.com/richcarl/otp/commits/pinning-otp</a></div><div>or if you prefer, fetch it to your otp repo:</div><div><br></div><div> git fetch <a href="https://github.com/richcarl/otp.git" target="_blank">https://github.com/richcarl/otp.git</a> pinning-otp</div><div> git log FETCH_HEAD<br clear="all"><div><div dir="ltr"> git show FETCH_HEAD^ (for the normal modules)<br></div><div dir="ltr"><div dir="ltr"> git show FETCH_HEAD (for the testsuites)<br></div><div dir="ltr"></div></div><div dir="ltr"><br></div><div>Note though that I did this as quickly as possible just to make it pass the build. To do a conversion like this for real, the annotated code should at least be superficially inspected by someone who knows it, because I think some of the existing uses are wrong or at least not quite what you want to do, and this would be your main chance to find and correct them.</div><div><br></div><div>On the whole, I find the annotated code so much more readable (in those places where already-bound variables are used), that it's not even funny when I think about all the time lost over the years to staring at the code and trying to see which variables are already bound and how that affects the control flow. With ^-annotations, these uses stick out very clearly even when quickly scanning the code in less or git diff.</div><div><br></div><div>Here follow some observations I've made from just looking at this diff for things that stand out as odd:</div><div><br></div><div>--------------------------------------------------<br>Multiple pinned variables on same line - immediately visible what's happening<br><br> receive<br> {'DOWN', ^Ref, process, ^Proc, _Info} -><br> badarg;<br><br> receive<br> {ssh_cm, ^SSH, {open, ^Chn, RemoteChn, {session}}} -><br><br>The alternative way with temporary variables and guards are much harder to follow:<br><br> receive<br> {'DOWN', Ref1, process, Proc1, _Info} when Ref =:= Ref1, Proc =:= Proc1-><br> badarg;<br><br> receive<br> {ssh_cm, SSH1, {open, Chn1, RemoteChn, {session}}} when SSH1 =:= SSH, Chn1 =:= Chn-><br><br><br>--------------------------------------------------<br>Some comments existed mainly because the use was not obvious:<br><br> #{Start:=FromPos} = SPos, %Assertion.<br><br> {Name, N, Reply} -> %% Name is bound !!!<br><br>Annotations are checkable comments:<br><br> #{Start:=^FromPos} = SPos,<br><br> {^Name, ^N, Reply} -><br><br><br>--------------------------------------------------<br>Some possible bugs immediately stand out when ^-annotated:<br><br> [L1,_L2|^Rest] when is_list(L1) -> ...<br><br>The rest of the list is alreay bound? Seems fishy.<br><br><br> case lists:keyfind(AppName, 1, StopRunning) of<br> {^_AppName, ^Id} -> ...<br><br>An underscore-prefixed variable which is already bound? Probably working by luck, not by intention.<br><br><br> parse_top(Line0, DecodeOpts, D) -><br> {Label,Line1} = get_label(Line0),<br> {Term,Line,^D} = parse_term(Line1, DecodeOpts, D),<br><br>Did the author really expect parse_term to return the same D here?<br><br><br> ^E0=processed_whole_element(S,Pos,Name,Attrs1,Lang,Parents,NSI,Namespace),<br><br>This whole line turned out to occur twice in the function body, but gets the same<br>result both times, thankfully.<br><br><br>--------------------------------------------------<br>Some weird code becomes obvious when annotated.<br><br>What does this line do?<br><br> _ = [M = M:module_info(module) || M <- Needed],<br><br>Oh, it's a multi-assertion!<br><br> _ = [^M = M:module_info(module) || M <- Needed],<br><br><br>--------------------------------------------------<br>Some uses indicate that you should probably have written this in a less cute way:<br><br> {M, ^F, A} = MFA = {cerl:atom_val(cerl:call_module(Guard)), F, length(Args)},<br><br>is easier to read and maintain as:<br><br> M = cerl:atom_val(cerl:call_module(Guard)),<br> A = length(Args),<br> MFA = {M, F, A},<br><br><br>--------------------------------------------------<br>Some cases are just very unclear if they are intentional or not, until you<br>have a full understanding of what the code does:<br><br> select_bin_seg(#k_val_clause{val=#k_bin_int{size=Sz,unit=U,flags=Fs,<br> val=Val,next=Next},<br> body=B},<br> #k_var{}=Src, Fail, St0) -><br> Ctx = get_context(Src, St0),<br> {Mis,St1} = select_extract_int(Next, Val, Sz, U, Fs, Fail,<br> Ctx, St0),<br> {Bis,St} = match_cg(B, Fail, St1),<br> Is = case Mis ++ Bis of<br> [#b_set{op=bs_match,args=[#b_literal{val=string},OtherCtx1,Bin1]},<br> #b_set{op={succeeded,guard},dst=Bool1},<br> #b_br{bool=Bool1,succ=Succ,fail=Fail},<br> {label,Succ},<br> #b_set{op=bs_match,dst=Dst,args=[#b_literal{val=string},_OtherCtx2,Bin2]}|<br> [#b_set{op={succeeded,guard},dst=Bool2},<br> #b_br{bool=Bool2,fail=Fail}|_]=Is0] -><br> ...<br><br>Is the use of Fail in the patterns above intentional and important?<br>There are no comments to guide you.<br><br>In the following, is the duplicated call to ssa_args/2 (with identical<br>arguments) and the match on the already bound As with the (hopefully same)<br>result of the second call intentional?<br><br> test_cg(Test, Inverted, As0, Fail, St0) -><br> As = ssa_args(As0, St0),<br> case {Test,ssa_args(As0, St0)} of<br> {is_record,[Tuple,#b_literal{val=Atom}=Tag,#b_literal{val=Int}=Arity]}<br> when is_atom(Atom), is_integer(Int) -><br> false = Inverted, %Assertion.<br> test_is_record_cg(Fail, Tuple, Tag, Arity, St0);<br> {_,As} -><br> {Bool,St1} = new_ssa_var('@ssa_bool', St0),<br> ...<br> end.<br><br>In the below, is the assertion intentional, or should it have been a new<br>variable "Plt1 = InitState#st.plt"?<br><br> get_warnings(Callgraph, Plt, DocPlt, Codeserver,<br> TimingServer, Solvers, Parent) -><br> InitState =<br> init_state_and_get_success_typings(Callgraph, Plt, Codeserver,<br> TimingServer, Solvers, Parent),<br> Mods = dialyzer_callgraph:modules(InitState#st.callgraph),<br> Plt = InitState#st.plt,<br> CWarns =<br> dialyzer_contracts:get_invalid_contract_warnings(Mods, Codeserver, Plt),<br> ...<br><br>Can you spot the assertion in the below?<br><br> do_init_trans_id_counter(ConnHandle, Item, Incr) -><br> case megaco_config:lookup_local_conn(ConnHandle) of<br> [] -><br> {error, {no_such_connection, ConnHandle}};<br> [ConnData] -><br> %% Make sure that the counter still does not exist<br> LocalMid = ConnHandle#megaco_conn_handle.local_mid,<br> Min = user_info(LocalMid, min_trans_id),<br> Max =<br> case ConnData#conn_data.max_serial of<br> infinity -><br> 4294967295;<br> MS -><br> MS<br> end,<br> Item = ?TID_CNT(LocalMid),<br> Incr2 = {2, Incr, Max, Min},<br> case (catch ets:update_counter(megaco_config, Item, Incr2)) of<br> ...<br><br>Is the assertion below intentional, or a left-over from some old refactoring?<br>(There are no other calls to write_to_store()).<br><br> ...<br> Oid = {Tab, element(2, Val)},<br> case LockKind of<br> write -><br> mnesia_locker:wlock(Tid, Store, Oid);<br> sticky_write -><br> mnesia_locker:sticky_wlock(Tid, Store, Oid);<br> _ -><br> abort({bad_type, Tab, LockKind})<br> end,<br> write_to_store(Tab, Store, Oid, Val);<br> ...<br><br> write_to_store(Tab, Store, Oid, Val) -><br> {_, _, Type} = mnesia_lib:validate_record(Tab, Val),<br> Oid = {Tab, element(2, Val)},<br> ...<br><br>This double file header check is probably intentional, but a comment would<br>have helped. With a ^-annotation on the second H0, the intent would have<br>been clear:<br><br> try dets_v9:check_file_header(FH, Fd) of<br> {ok, H0} -><br> case dets_v9:check_file_header(FH, Fd) of<br> {ok, H0} -><br></div><div><br></div><div><br></div><div><br></div><div dir="ltr"> /Richard</div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Den fre 15 jan. 2021 kl 13:34 skrev Richard Carlsson <<a href="mailto:carlsson.richard@gmail.com" target="_blank">carlsson.richard@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>There have been many strong reactions in this thread, so let me give you some statistics to show how much this feature of using bound variables is actually used in practice. I checked the entire OTP codebase: there are just over 1300 modules, and in total about 595000 variable occurrences in patterns, of which only 3350 are already bound.. That makes 0.56% of all variables in patterns - about once in 200 to make it simple. On average, that's 2-3 usages per module - some modules using it more and some not using it at all.</div><div><br></div><div>I find it hard to see, then, why it should be a big issue to ask programmers to annotate these few occurrences for readability and maintainability. It's certainly not as big of a change as for example when the warning for unused variables, unless prefixed with _, was made the default.</div><div><br></div><div>Imagine a world where Erlang had not allowed already-bound variables in patterns (forcing you to use the idiom "X1 when X1 =:= X -> ...", as in e.g. Haskell), and that someone now came with the suggestion that to make things simpler, we could just implicitly match on the value of X if X is already bound. The old me from my university days would probably have said "that's really elegant, let's do it". But the maintainability-and-readability me, with experience of very large code bases, large numbers of developers, and many relative newcomers to the language, would say "aw hell no". This is a cute feature, but it carries a large cognitive cost and is not worth having compared to how relatively little it is used. Being explicit about intention is much more important.</div><div><br></div><div><div dir="ltr"> /Richard</div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Den tors 24 dec. 2020 kl 21:10 skrev Richard Carlsson <<a href="mailto:carlsson.richard@gmail.com" target="_blank">carlsson.richard@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">The ^ operator allows you to annotate already-bound pattern variables as ^X, like in Elixir. This is less error prone when code is being refactored and moved around so that variables previously new in a pattern may become bound, or vice versa, and makes it easier for the reader to see the intent of the code.<div><br></div><div>See also <a href="https://github.com/erlang/otp/pull/2951" target="_blank">https://github.com/erlang/otp/pull/2951</a><br clear="all"><div><div dir="ltr"><br></div><div>Ho ho ho,</div><div dir="ltr"><br> /Richard & the good folks at WhatsApp</div></div></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><a href="http://www.linkedin.com/in/torbenhoffmann" target="_blank">http://www.linkedin.com/in/torbenhoffmann</a><br></div>@LeHoff<br></div></div>