[erlang-questions] Updates, lenses, and why cross-module inlining would be nice

Thu Nov 26 05:52:18 CET 2015

Summary:
 (1) A promising way to solve the nested updates problem
     is to be more functional, not less.
 (2) There's an approach we could steal from Haskell.
 (3) There's free code to play with, worth every penny.

Consider a simplified version of Pascal:

  <variable> ::= <identifier> <selector>*
  <selector> ::= <dereference> | <field> | <index>
  <dereference> ::= '^'
  <field> ::= '.' <identifier>
  <index> ::= '[' <expression> ']'

If we think of a selector as a function
  f :: var T1 -> var T2
it's clear that <identifier> <selector> is function
application and <selector 1> <selector 2> is
function composition.

Yet in Pascal, and C, and languages like them,
while you can *apply* selectors and *compose* them,
they cannot be passed as parameters or even named.

If a functional language has something like selectors,
they should be first class values, just like other
more-or-less-function-like things.  But what kind of
value?

I think it was Reynolds, analysing Algol 60, who came
up with the idea that a pass-by-name parameter was
really a *pair* of functions:  one to fetch a value
and one to store it, and thereby freed us from the
limiting idea that there needed to be a 'var T' type.

The Haskell community have taken this idea and
developed it.  There this concept is called a "Lens".

http://www.cs.otago.ac.nz/staffpriv/ok/lens.erl

is a crude implementation of lenses in Erlang.
A lens is a selector from (values of) type A to
(values of) type B.  It's represented as a triple

    {Get, Put, Upd}
    Get :: (A) -> B
    Put :: (A, B) -> A
    Upd :: (A, (B) -> B) -> A

The Get function says how to extract a B value from an
A value.  The Put function says how to put a B value
into an existing A value, returning a new A value.
The Upd function says how to compute a new A value by
extracting a B, applying a function to it, and putting
it back.  We expect the following laws to hold:

    Get(Put(A, B)) = B
    Put(A, Get(A)) = A
    Upd(A, Fun) = Put(A, Fun(Get(A)))

That is, if you make a lens by hand, you should ensure
that this is true.

There are functions for using lenses so you don't have
to think about the representation.

    lens:get(Lens, A) -> B
    lens:put(Lens, A, B) -> A'
    lens:update(Lens, A, F) -> A'

There are also functions for making simple lenses.
For example, if you have a record 'rec' with a field
'fld', lens:tuple(#rec.fld) will give you a lens
for that field.

Warning: the functions in the lens returned by
lens:tuple(#rec.fld)
do NOT check at run time that they are dealing with
a 'rec' record.  They can't, because #rec.fld is just
a number, and you can't get back from that number to
'rec' or the arity of the record.

The important thing is that these functions compose.

For example, suppose you have
  -record(foo, {boo,coo,goo,zoo}).
and Foozle is a list of these, and you want to find
the first record with boo=42, and increment its zoo
field by 137.

    Find = lens:where(fun (#foo{boo=Boo}) -> Boo == 42 end)

is a lens for finding the record you want and

    Field = lens:tuple(#foo.zoo)

is a lens for getting at the zoo field of such a record.
lens.erl provides composition of 2 .. 6 lenses.  So

    Selector = lens:c(Find, Field)

is a selector that refers to a field of a record in a list,
and

    lens:update(Selector, Foozle, fun (Zoo) -> Zoo + 137 end)

does the whole job (in just one pass through the list, too,
which is why the Upd function is there).

More simply,
    lens:c(lens:tuple(#a.b), lens:tuple(#c.d),
           lens:tuple(#e.f), lens:tuple(#g.h))
is the equivalent of .b.d.f.h in Pascal, except that it is
a value you can pass around like any other.

You are not limited to fields that actually exist as such.
For example, you could do

gb_set(Element) ->
    {   fun (Set) -> gb_sets:is_member(Element, Set) end
    ,   fun (Set, false) -> gb_sets:delete_any(Element, Set)
          ; (Set, true)  -> gb_sets:add(Element, Set)
        end
    ,   fun (Set, Fun) ->
            Old = gb_sets:is_member(Element, Set),
            case Fun(Old)
              of Old   -> Set
               ; true  -> gb_sets:insert(Element, Set)
               ; false -> gb_sets:delete(Element, Set)
            end
        end
    }.

and make a gb-set act like a dictionary with Boolean elements.

There's even a combinator all/1 so that
    All = lens:c(lens:tuple(1)),
    Data = [{3,a},{1,b},{4,c},{6,d}]
    lens:get(All, Data)
 => [3,1,4,6]
    lens:update(All, Data, fun (N) -> N*10 end)
 => [{30,a},{10,b},{40,c},{60,d}]

So what about cross-module inlining?

By the time you've made and composed a couple of lenses,
you have a tangle of funs.  If the compiler (a) always
inlines (fun (...) -> ... end)(...) and (b) inlines the
functions from the lens module, then the tangle resolves
tidily into reasonably straightforward code.

But the cross-module inlining step is crucial.  Without
that, the compiler has nothing to work on.

At its crudest, consider

    lens:get(lens:tuple(#rec.fld), Rec)

*With* cross-module inlining and inlining of funs that
are applied to known arguments and dead code elimination,
you end up with

    element(#rec.fld, Rec)

*Without* cross-module inlining, three funs are built,
only one of which will ever be called.