Erlang language issues

Tue Apr 16 07:37:29 CEST 2002

Sorry that this is an equally long (actually, longer!) response to a long
message... it's a topic that is of great interest to me :)

On Mon, 15 Apr 2002 16:23:34 +0200 (MET DST)
Richard Carlsson <richardc@REDACTED> wrote:

> For some reason, the original and/or were defined to evaluate both
> arguments. Personally, I would love to give and/or the semantics
> of andalso/orelse, but previous experience suggests that any
> semantic changes in Erlang, however obscure, are politically
> impossible. Could this be an exception to the rule?

With any production language, retaining backwards-compatibility is
important, some might even say crucial.

Languages are never designed 100% correctly the first time of course, such
an expectation would be unrealistic.  But once an imperfect language is in
use, it has to stay imperfect... one might say that it experiences
'language pollution' after a time.  Things like erlang:hash/1, the lib
module, the orddict module, the assumption that ports are synonymous with
named pipes, etc, are all deprecated and in theory will disappear someday
- but in practice, it seems likely they won't, because they can't without
breaking someone's old code somewhere.  It's somewhat akin to archaic
words in spoken language.  Very few people use the word 'wherefore' in
conversation these days, but if you don't know the word 'wherefore', you
can't enjoy Shakespeare!

On the other hand, if all you want to do is enjoy Shakespeare, you don't
need to know the word 'microwave'...

Back in terms of programming languages - sometimes the pressure from
language pollution encourages people to design a fresh new language which
discards the 'mistakes' of the old language while retaining all the things
it got 'right'.  Java, over C++, might be an example of that.  I doubt
that there is enough pressure currently to completely redesign and migrate
to a "Rational Erlang", but one could also say that it's only a matter of
time.  (That's kind of a rhetorical excuse, though.)

> Chris Pressey wrote:
> 
> - Allow ANY function in guard tests.
> - Only WARN if the guard test does not operate in constant time.
> - Deduce which functions operate in constant time by analysis
> 
> The main problem with allowing any function in guards is not
> really the time (after all, length/1 is allowed),

Well, this is beside the point, and splitting hairs too, but is it written
in stone that length/1 must operate in O(n) time?  The length of a list -
as well as its tail pointer - could be cached for quick retrieval, and
this could make some list operations quicker (at the cost of other
operations)

Whether this would be 'breaking backwards-compatibility' or not is a tough
question, though.  Generally I'd say no, but for Erlang, being very
sensitive to execution performance, I might make an exception.

> but the side
> effects (such as sending messages or updating ETS tables). The
> assumption when a guard fails is that the state of the world did
> not change.

Very true.  Side-effects in a guard would be very bad programming practice
- usually a dire mistake, and when not, then certainly an awkward and
objectionable coding style.

But, Erlang treats this almost as a syntactic rule, rather than a semantic
one - the syntax of a guard expression disallows side-effects.  This seems
a bit strict - very understandable, considering the origins of the
language, but a bit strict nonetheless.  Reformulating it as a semantic
rule would be a bit nicer, in my mind.

> Also, guards are used in 'receive': sending a message
> from a guard expression in a receive-clause, or evaluating a
> receive within a receive, could cause major inconsistencies (it
> would be very hard to specify what the semantics actually should
> be, if this was allowed).

Absolutely.  Again, my only problem with it is that the restrictions are
more syntax-based than semantics-based.

On a related note, is there a good reason for 'receive' to be a language
structure, rather than a BIF?  For example, instead of

  receive
    {foo, X} -> bar(X);
    {baz, Y} -> quuz(Y)
  end

couldn't one say, with identical intent,

  Msg = receive(),
  case Msg of
    {foo, X} -> bar(X);
    {baz, Y} -> quuz(Y)
  end

Is the reason that it's a language structure, that the compiler can
more easily generate better code (something like Linda?)

> Analysing an expression to tell whether it is side effect free is
> not that hard, but note that even a call to another module is a
> potential side effect (the module might even not be loaded yet).
> So for user-defined functions, only local calls could be allowed.

And that diminishes its usefulness - as it would be desirable to put
user-defined guards in a common, reusable module - but I can see why,
given the ability to update code while it's running.

I'm tempted to see module loading as a special case of side-effect,
though.  It should only happen once for each time the code is updated,
which would not be a common occurance.

> Now, guards should be efficiently implemented: even if not always
> taking constant time, the overhead should be small, so a full
> backtracking mechanism like in Prolog is out of the question. The
> current Beam implementation relies quite heavily on being able to
> generate special code for guards, where many things can be
> assumed:
> 
> 	- no side effects can happen
> 	- exceptions only cause a jump to a "failure" label
> 	- all calls are to builtins; special calling conventions
> 	  can be used.
> 	- any created data is not live outside the guard
> 	- (possibly other things...)
> 
> To put it briefly: allowing more general expressions in guards is
> definitely something one would like to do, but it needs a lot of
> work to make it both safe and efficient.

I see...  I tend to see the most elegant path to achieving this, to be
rigorous analysis and optimization on the compiler's part.  But this
definately has drawbacks, not the least of which is the complexity of the
compiler, and long compile times.

> Chris also suggested:
> 
> > - Possibly have a module 'type' and place all the type-assertion
> > functions in it (type:number(X), type:list(X), etc)
> 
> Structured module namespaces are just making their way into the
> language, so you may see something like "erl.lang.term:is_list(X)"
> in the future.

Oddly, I've never been too concerned that Erlang has such a 'flat'
namespace structure.  I think a deep heirarchy would be even worse. 
Unless perhaps there was some sort of 'search path'-like mechanism for
resolving names.

> > As for short-circuiting, you shouldn't have to think about it in
> > referentially transparent code, and (for me at least) it's
> > fairly rare that I have to think about it in side-effectful code
> > either, so I'm not sure why orelse and andalso were introduced,
> > when two seperate tests in the code is more explicit and
> > possibly clearer.
> 
> Of course, if the compiler can decide that the RHS of an 'and' is
> ref. transparent *and* type-safe (i.e., can't cause an exception),
> it can generate short-circuit code. But this is often not possible
> (outside guards). Using andalso/orelse let you express that you
> know what you are doing.
> 
> Having two separate tests is not clearer. Compare e.g.:
> 
> 	is_string([X | Xs]) ->
> 	    is_char(X) andalso is_string(Xs);
> 	is_string([]) ->
> 	    true.
> to:
> 	is_string([X | Xs]) ->
> 	    case is_char(X) of
> 	        true ->
> 	            is_string(X);
> 	        false ->
> 	            false
> 	    end.
> 	is_string([]) -> true.
> 
> With the explicit test, you have to make sure you get the
> true/false cases right (which gets error prone if there is more
> than one level) when writing it, and furthermore, someone reading
> your code must ask him/herself "what does this (nested) switch
> here really implement?"

Perhaps this is a strange way to approach it, but it seems to me that
Erlang's evalutation-order-closely-follows-source-code-order feature could
be exploited to write a fairly clear, non-nested version of this as:

  is_string([X | Xs]) when not is_char(X) -> false;
  is_string([X | Xs]) when not is_string(Xs) -> false;
  is_string(X) when is_list(X) -> true.

On the topic of type-safeness... I am somewhat confounded by the fact that
all errors are treated equally.  This may be somewhat difficult to
explain, but I don't feel that a type error and (e.g.) a file-not-found
error are of the same gravity.  What I am doing more and more often in my
code is writing 'wrapper' functions around BIF's that throw errors.  One
example of this is list_to_integer/1.  Often I'm in a position where I
want to convert a string to an integer, even if it isn't a legal integer. 
I ended up writing a function like:

  my_list_to_integer(List, Default) ->
    case catch list_to_integer(List) of
      X when is_integer(X) -> X;
      _ -> Default
    end.

Some might say that this comes from ingrained habit of working with
languages with bad error-handling capabilities, and that may be true.  But
then again, string:str/2 doesn't throw an error when the substring is not
found, so... is it really 'wrong'?

(Also it would be more consistent for the indexes in GS stuff to be
1-based, not 0-based, but that is really digressing from the point...)

> 
> Chris also wrote:
> 
> > more and more I am finding myself disagreeing with the canon law
> > that message passing ought to be encapsulated in function calls.
> > I agree that it ought to be *encapsulated* for the sake of
> > abstraction, but I believe that a function is the wrong thing to
> > encapsulate it in! Functions traditionally do not have side
> > effects. If I have a piece of Erlang code like
> > 
> >   Z = some_function(X,Y).
> > 
> > then I cannot immediately tell if there are side effects
> > involved. [...] However, if I have a piece of Erlang code like
> > 
> >   some_server ! {some_function, self(), [X,Y]},
> >   Z = receive
> >     Any -> Any
> >   end.
> > 
> > then I can *immediately* tell at a glance that there *are* side
> > effects involved,
> 
> I agree: there should (in a better world) be another kind of
> abstraction for this purpose.

One such thing I thought of - not particularly elegant - is a 'relay
process', a process whose entire purpose is simply to pass any messages it
receives, to another process - possibly translating them while in transit.
The translation provides a place for the abstraction to happen.

But this seems, at the very least, somewhat wasteful.

> The real issue is not that the
> send/receive are explicit in the code - because they often ought
> to be - but that they expose the data structures used in the
> message passing (or to be specific, the 'receive' does). A way to
> handle this would be to have abstract patterns (O'Keefe 1998), but
> from what I have heard, implementing this idea in an efficient way
> turned out to be a lot more difficult than expected.

It might not be necessary to go that far (but abstract patterns might
clear up a lot of other issues as well) - I think mainly it's a matter of
breaking the habit of thinking in terms of what is being sent as data.  Or
'raw' data, if you like.  A function can be thought of in terms of 'raw'
data - it is an atom or two, for the module and function name, and a list
of terms, for the arguments.  But we don't generally think of it this way,
for reasons of abstraction (although we DO think of it this way when using
spawn/3, but never mind that! :)  If messages could be thought of as
similarly consistently 'packaged' data, that might help matters a bit,
without going all the way into abstract patterns.

Considering spawn/3, and the syntax used in ets:match, I would say that
Erlang is a bit weak when it comes to reflectivity - being able to
describe Erlang code in Erlang.  Ideally the syntax for spawn'ed functions
would closely match that of non-spawn'ed functions, and the syntax for
ets:match'ing would closely resemble that for regular case ... of ... end
matching.  Abstract patterns seem to hold a comprehensive, if somewhat
drastic (overkill) solution, to all these things, and more (like macros.)

My, I do tend to ramble on, don't I?  Sorry about that :)

Chris