Author: Richard carlsson <carlsson.richard(at)gmail(dot)com>
Status: Draft
Type: Standards Track
Created: 21-Dec-2020
Erlang-Version: 24
Post-History: 24-Dec-2020

EEP 55: Pinning operator ^ in patterns

Abstract

This EEP proposes the addition of a new unary operator ^ for explicitly marking variables in patterns as being already bound. This is known as "pinning" in Elixir - see [Elixir doc][the Elixir documentation].

For example:

f(X, Y) ->
    case X of
        {a, Y} -> ok;
        _ -> error
    end.

could be written more explicitly:

f(X, Y) ->
    case X of
        {a, ^Y} -> ok;
        _ -> error
    end.

In Elixir, this operator is strictly necessary for being able to refer to the value of a bound variable as part of a pattern, because variables in patterns are always regarded as being new shadowing instances (like in Erlang's fun clause heads), unless explicitly pinned.

In Erlang, they would be optional, but are still a good idea because they make programs more robust under edits and refactorings, and furthermore allow the use of pinned variables in fun clause heads and in comprehension generator patterns.

Specification

A new unary operator ^ is added to Erlang, called the "pinning operator". It may only be used in patterns, and only on variables. Its meaning is that the "pinned" variable is to be interpreted in the enclosing environment of the pattern, and its value used in its place for that position in the pattern.

In current Erlang, this behaviour is what happens automatically in ordinary matching constructs if the variable is already bound in the enclosing environment. In the following example:

f(X, Y) ->
    case X of
        {a, Y} -> {ok, Y};
        _ -> error
    end.

the use of Y in the pattern is regarded as a reference to the function parameter Y, instead of as introducing a new variable, and the Y in the clause body is then that same parameter. Therefore, annotating the pattern variable as ^Y in this case does not change the behaviour of the program, but makes the intent explicit:

f(X, Y) ->
    case X of
        {a, ^Y} -> {ok, Y};
        _ -> error
    end.

For fun expressions and list comprehension generator patterns, the pinning operator makes the language more expressive. Take the following Erlang code:

f(X, Y) ->
    F = fun ({a, Y}) -> {ok, Y};
            (_) -> error
        end,
    F(X).

Here, the occurrence of Y in the clause head of the fun F is a new variable instance, shadowing the Y parameter of f(X, Y), and the fun clause will match any value in that position. The Y in the clause body is the one bound in the clause head. However, using the pinning operator, we can selectively match on variables bound in the outer scope:

f(X, Y) ->
    F = fun ({a, ^Y})  -> {ok, Y};
            (_) -> error
        end,
    F(X).

In this case, there is no new binding of Y, and the use of Y in the fun clause body refers to the function parameter. But it is also possible to combine pinning and shadowing in the same pattern:

f(X, Y) ->
    F = fun ({a, ^Y, Y})  -> {ok, Y};
            (_) -> error
        end,
    F(X).

In this case, the pinned field refers to the value of the function function parameter, but there is also a new shadowing binding of Y to the third field of the tuple. The use in the fun clause body now refers to the shadowing instance.

Generator patterns in list comprehensions or binary comprehensions follow the same rules as fun clause heads, so with pinning we can for example write the following code:

f(X, Y) ->
    [{b, Y} || {a, ^Y, Y} <- X].

where the Y in {b, Y} is the shadowing instance bound to the third element of the pattern tuple.

Finally, a new compiler flag warn_unpinned_vars is added, disabled by default, which if enabled makes the compiler emit warnings about all uses of already bound variables in patterns that are not explicitly annotated with the ^ operator. This allows users to migrate their code module by module towards using explicit pinning in all their code. If pinning becomes the norm in Erlang, this flag could be turned on by default, and eventually, the pinning operator could become strictly required for referring to already bound variables in patterns.

Rationale

The explicit pinning of variables in patterns make programs more readable, because the intent of the code becomes clear. When already bound variables are used in Erlang without any annotation, anyone reading a piece of code must first study it closely to understand which variables will be bound at the point of a pattern, before they can tell whether any pattern variable is a new binding or implies an equality assertion. This is easy to miss even for experienced Erlangers, be it during code reviews or while trying to understand a piece of poorly commented code.

Perhaps more importantly, pinning also makes programs more robust under edits and refactorings. Take our previous example, and add a print statement:

f(X, Y) ->
    io:format("checking: ~p", [Y]),
    case X of
        {a, Y} -> {ok, Y};
        _ -> error
    end.

Suppose someone renames the function parameter from Y to Z and updates the print statement but forgets to update the use in the case clause. Without an explicit pinning annotation, the change would be quietly allowed, but the Y in the pattern would be interpreted as a new variable that will match any value, which will then be used in the body. This changes the behaviour of the program. If the use in the pattern had been annotated as ^Y, the compiler would have generated an error "Y is unbound" and the mistake would have been caught.

When code is being modified to add a feature or fix a bug, a programmer might want to introduce a new variable for a temporary result. In a long function body, this risks introducing a new bug. Consider the following:

g(Stuff) ->
   ...
   Thing = case ... of
               {a, T} -> T;
               _ -> 0
           end,
   ...
   {ok, [Thing|Stuff]}.

Here, T is a new variable, clearly intended as just a temporary and local variable for extracting the second element of the tuple. But suppose that someone adds a binding of the name T further up in the function body, without noticing that the name is already in use:

g(Stuff) ->
   ...
   T = q(Stuff) + 1,
   io:format("~p", [p(T)]),
   ...
   Thing = case ... of
               {a, T} -> T;
               _ -> 0
           end,
   ...
   {ok, [Thing|Stuff]}.

Now the first clause of the case switch will only match if the second element of the tuple has the exact same value as the previously defined T. Again, the compiler quietly accepts this change, while if it had been instructed to warn about all non-annotated uses of already bound variables in patterns, this mistake would have been detected.

Shadowing in Funs and Comprehensions

In funs and comprehensions, pinning also lets us do things that otherwise requires additional temporary variables. Consider the following code:

f(X, Y) ->
    F = fun ({a, Y}) -> {ok, Y};
            (_) -> error
        end,
    F(X).

Since the Y in the clause head of the fun is a new shadowing instance, the pattern will match any value in that position. To match only the value passed as Y to f, a clause guard must be added, and a temporary variable be used to access the outer Y:

f(X, Y) ->
    OuterY = Y,
    F = fun ({a, Y}) when Y =:= OuterY -> {ok, Y};
            (_) -> error
        end,
    F(X).

We could instead rename the inner use of Y to avoid shadowing, but the equality test must still be written as an explicit guard:

f(X, Y) ->
    F = fun ({a, Z}) when Z =:= Y -> {ok, Y};
            (_) -> error
        end,
    F(X).

With the help of the pinning operator, such things are no longer a concern, and we can simply write:

f(X, Y) ->
    F = fun ({a, ^Y}) -> {ok, Y};
            (_) -> error
        end,
    F(X).

Furthermore, in the odd case that the pattern would both need to access the surrounding definition of Y as well as introduce a new shadowing binding, this can be easily written using pinning:

f(X, Y) ->
    F = fun ({a, ^Y, Y})  -> {ok, Y};
            (_) -> error
        end,
    F(X).

but in current Erlang, two separate temporary variables would be required:

f(X, Y) ->
    OuterY = Y,
    F = fun ({a, Temp, Y}) when Temp =:= OuterY -> {ok, Y};
            (_) -> error
        end,
    F(X).

As explained before, the same goes for patterns in generators of comprehensions.

Backwards Compatibility

The addition of a new and previously unused operator ^ does not affect the meaning of existing code, and the compiler will not emit any new warnings or errors for existing code, unless explicitly enabled with warn_unpinned_vars. This change is therefore fully backwards compatible.

Implementation

The implementation can be found in PR #2951.

Copyright

This document has been placed in the public domain.