[erlang-questions] Advanced Erlang Subtleties

Thu May 19 22:02:28 CEST 2011

Mondays's excellent London User Group about Misultin ( by Roberto Ostinelli
) brought up many fascinating discussions, one was about supervisors vs
custom rolling the functionality & storing Pids in an Ets table due to
supervisors storing children as a list, how to make Misultin 0.8 OTP
compatible but still quick to get started with & directly embeddable etc..

One thing I decided to after one convocation was to document all the erlang
syntax ( eg, no otp ) subtleties or bits i find counter-intuitive i have
discovered, for the use of others & also to prompt the occasional "why was
it implemented this way" out of anybody who knows. None of this is a
criticism ( and some is point out things that are lesser used and i very
much like), as so far i have found erlang to have very few
inconsistencies,including in error conditions!, but knowing all the
subtleties for any language I use is a personal favourite of mine ( coming
from js gives you that! )

so here we go :

*A) List Comprehension are personally my favourite part of erlang, but have
the most subtleties of anything, partially because they pack a lot of
functionality in a tiny bit of syntax :

1) LC's act as a list filter, then list map
therefore :
List = [{a,b},{c,d,e,f},{h,i}]
[I1||{I1,I2}<-List] will not crash but instead filter out the 4 elem tuple.
This is great, but where is the opposite version of a LC that is only a map
and will crash ?
The standard behaviour seems to go against "let it crash" & on a few
occasions have had code that gives a final [] rather than the output
expected.
Using :
[fun({I1,I2}) -> I1 end(Elem)||Elem<-List]
or
[begin {I1,I2}=Elem, I1 end||Elem<-List]
give a behaviour equivalent to a list map but are less elegant than a :
[I1||{I1,I2}<---List] or something similar. ( notice the <--- )
I would really welcome this addition to the language ( and many many many
user i have met presume this is the default behaviour for a LC :-)

2) List comprehensions can't normally contain multiple statements, possibly
because it might confuse users that [do(),do2(),do3()||_<-List] would add
three elements to a list each time (that syntax could be really useful to do
that tho!). You can however use the "begin end", also shown above, syntax
like : [begin do(),do2(),do3() end ||_<-List] to make it all "one statement"
(and you will end up with a list of the results of do3()).

3) This, i feel, is a VERY annoying scoping bug :

OuterVar = 'a', List = [{a,b},{c,d},{e,f}],

Y : [I2||{OuterVar,I2}<-List]

Z : [I2||{I1,I2}<-List when I1==OuterVar]

these do not produce the same result.

in "Y:" the "OuterVar" gets overridden, not matched, so you end up with
every element matching. In "Z: " it behaves (as "Y:" should) ad filters out
all but the matches with 'a'
this is unlike a case where
case input of Input -> ok; _ -> other end,
will give ok but if you set Input before to be say 'not_an_input_atom' it
will give 'other'

I hate this behaviour... :-)

4) you can produce generatorless LC's
these act like a "if the entire list does not match the guards replace with
an empty list otherwise return it".. they look crazy! but can be useful when
generating iolists (or *not* iolists :-) to avoid case statements

5) custom guards and error handling :
after much thought i feel this is the correct behaviour, though it's
surprising :

List = [1,2,three,4,5],

Y : [I1||I1<-List when I1+1>0 ]

Z : Guard = fun(I1) -> I1+1>0 end,
[I1||I1<-List when Guard(I1)]

"Y :" will filter out 'three' and "Z : " will crash, and importantly crashes
when it hits 'three' and not before. same code! however having guards that
rerun false rather than crash prevents complex is_type() etc.. whereas, when
you call you could be calling huge functions, even with side effects ;
silencing all that makes no sense. It also allows you to make guards that
crash, so make the code a tad less defensive.

6) list comprehensions crash with an invalid list, but only when it reaches
the tail element :
[io:format("~p~n",[Elem])||Elem<-[1,2,3,4|invalid_tail]]
will first print out 1 to 4, then crash

*B) variable name repetition matchings :

you can write :
function_name(Elem1,Elem1)->this_matches;
function_name(Elem1,Elem2)->this_does_not_match;

and when you put function_name(100,100) it will match. this may seem obvious
but in many cases it can reduce a lot of code where otherwise you are
writing guards or dropping into a case, especially when you have more
complex disassembly pattern matches combined with recursion. I use this in
much of my code, as it feels more functional than lots of == or case etc..
:-)

*C) pattern matches can be anywhere
All things in erlang always return, even pattern matches which return
themselves so :

[elem1,Var1 = elem2,elem3] will give the list [elem1,elem2,elem3] but also
set Var1 to elem2. In a more complex case this can (with reasonable use)
reduce code a lot as it lets you change the order you build up code, so the
last element that returns is the one you want.. you don't need to build up
an intermediate value, do some code, then place that var at the end to
return.

They can also be used in function heads like :
function(Elem1={_,_,_})-> Elem.

*D) "begin end" statements can be anywhere

this can be used to embed more than one statement when building a list etc :
[elem1,
elem2,
begin
{Elem3,_,_,_ } = fetch_elem3(),
Elem3
end,
elem4]

its also possible to put case, if etc.. ( as everything in erlang returns)
statements there too like :
[elem1,
elem2,
case Elem3 of
 _ when is_atom(Elem3) -> atom_to_list(Elem3)
_ -> Elem3
end,
elem4]

*E) case ( and other like try catch etc.. ) drop their matched variables out
the bottom ( eg they stay in scope even after they are used)
case input_atom of Input -> ok; _ -> other end,
Input is now still 'input_atom'

*F) Operators have Type Precedence
"hello < 10" etc.. does not throw but returns false.
the atom 'aaa' is less than the atom 'aab' etc..

There may be small use cases for this like implementing generic sorting
functions etc.. but 99% of the time if you magnitude compare two different
types ( without intermediate type conversion, and ignoring float/int
differences) this is a huge bug in your code.

I can see the equivalent evaluation of "250"<250 being a common mistake, and
always being false is hardly useful. Also this behaviour can hardly help the
dialyser spot common type mistakes as this is valid code. Does this ever
prove to be a big problem in large scale systems? or not really? or does
producing a module full of :

'>'(I1,I2) when is_number(I1), is_number(I2) -> I1>I2. ( '>' could be
"greater_than" or "gt" or similar)
which crashes on non numeric input produce helpful crashes at the point of
error? or just overkill?

*G) {module,function}(Inputs)
The tuple pair can be used as a way of calling functions, tho the
recommended style is: fun mod:funct/arity but still fun to know.

*H) Funs can be self executing by wrapping () after them like : fun
(FunInput) -> some_code() end(Input)

*J) try/catch:
1) - in the ERROR_TYPE:ERROR catch the throw: can be left out, and it means
the same as catching a throw of type "Error".. nice short hand.
There is also a 3rd parameter in the AST that is always a match all ( a bit
like ERROR_TYPE:ERROR:_ ) does anybody know what this means? is there an
extra option in a catch?

( maybe everybody knows these last few but just to be sure )
2) if nothing matches in the catch statements the error propagates up as if
there was no try catch at all, very useful and stops the need for
re-throwing like other languages!
3) in the try Code of ... catch the "of" is optional & returns the CODE
value if left out, also ( maybe v obvious) the code in the "of" section is
not caught
4) the old, strange and broken "catch" requires wrapping in parens if you
want its return value:
Var = catch throw(100). seems to be a syntax error
Var = (catch throw(100)). is fine
also distinguishing between certain throws and an ordinary return can be
impossible.

*K) records
( this got too long so I made it a follow on email :-) basically I wish to
write a (better) parse transform fix )

If after corrections/some extra suggestions someone wants to put this on
their blog then feel free, I don't have one :-)
James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20110519/fcc5df47/attachment.htm>