Erlang language issues

Mon Apr 15 16:23:34 CEST 2002

Hi everybody! I'll try to give some answers to the questions asked in
the latest Erlang language debate. (This got rather long; be warned.)

Following the "Compiler bug?" confusion, Thomas Lindgren asked:

> Perhaps it is time to raise the question whether there are
> getting to be too many similar operations in Erlang as well?
>
> For guard operations, we have type(X) and is_type(X). (Could
> this be unified into just type(X)?)

Originally, the only builtin type tests were on the form
"integer(X)" etc., and these could only be accessed in guards
which effectively have a separate name space, so you could not
write:

	Bool = integer(X)

anywhere outside a guard. Indeed, you cannot even use them at the
"guard expression" level, only the "guard test" top level, so the
following clause is not accepted:

	X when integer(X) == true -> ...

because integer(X) is not a "guard BIF", only a "guard test". To
further complicate things, "float(X)", is also defined as a BIF -
even as a guard BIF, with completely different semantics
(int-to-float typecast). Assume that B contains a boolean value.
Then, in the following clause:

	X when B == float(X) ->

float(X) does not refer to the type test, but to the typecast
function.

To make things more uniform, all guard tests were given new
alternative names "is_...(X)", which are also proper BIFs/guard
BIFs, and may thus be used anywhere, both as guard tests and in
normal code. Like most BIFs, they belong to the 'erlang' module,

The old type tests must of course remain for backwards
compatibility, but they are now being renamed to the "is_" forms
very early in the compilation.

Back to Thomas:

> For guards, we have comma, semicolon (or), and, or, not.
> (Could we merge some operations here?)

This is still work in progress, more or less. Robert Virding
extended the guard syntax to allow 'and' for comma and 'or' for
semicolon, as well as allowing these operators in guard
expressions. 'not' should also be allowed on all levels. It seems
that there still is some bug in the implementation, however.

Here too, the reason for the extension is to make guards more like
other expressions. A complication is that failure in the first
argument of a top-level guard 'or' (semicolon) goes on to try the
other alternative as if there were two clauses, while at the
expression-level the whole 'or'-operation fails immediately.

> For expressions, we have and, or, andalso, orelse. (Do people
> use "E1 and E2" while relying on E2 being evaluated when E1 is
> false? If not, rename andalso into and, etc.)

For some reason, the original and/or were defined to evaluate both
arguments. Personally, I would love to give and/or the semantics
of andalso/orelse, but previous experience suggests that any
semantic changes in Erlang, however obscure, are politically
impossible. Could this be an exception to the rule?

Chris Pressey wrote:

- Allow ANY function in guard tests.
- Only WARN if the guard test does not operate in constant time.
- Deduce which functions operate in constant time by analysis

The main problem with allowing any function in guards is not
really the time (after all, length/1 is allowed), but the side
effects (such as sending messages or updating ETS tables). The
assumption when a guard fails is that the state of the world did
not change. Also, guards are used in 'receive': sending a message
from a guard expression in a receive-clause, or evaluating a
receive within a receive, could cause major inconsistencies (it
would be very hard to specify what the semantics actually should
be, if this was allowed).

Analysing an expression to tell whether it is side effect free is
not that hard, but note that even a call to another module is a
potential side effect (the module might even not be loaded yet).
So for user-defined functions, only local calls could be allowed.

Now, guards should be efficiently implemented: even if not always
taking constant time, the overhead should be small, so a full
backtracking mechanism like in Prolog is out of the question. The
current Beam implementation relies quite heavily on being able to
generate special code for guards, where many things can be
assumed:

	- no side effects can happen
	- exceptions only cause a jump to a "failure" label
	- all calls are to builtins; special calling conventions
	  can be used.
	- any created data is not live outside the guard
	- (possibly other things...)

To put it briefly: allowing more general expressions in guards is
definitely something one would like to do, but it needs a lot of
work to make it both safe and efficient.

Chris also suggested:

> - Possibly have a module 'type' and place all the type-assertion
> functions in it (type:number(X), type:list(X), etc)

Structured module namespaces are just making their way into the
language, so you may see something like "erl.lang.term:is_list(X)"
in the future.

> As for short-circuiting, you shouldn't have to think about it in
> referentially transparent code, and (for me at least) it's
> fairly rare that I have to think about it in side-effectful code
> either, so I'm not sure why orelse and andalso were introduced,
> when two seperate tests in the code is more explicit and
> possibly clearer.

Of course, if the compiler can decide that the RHS of an 'and' is
ref. transparent *and* type-safe (i.e., can't cause an exception),
it can generate short-circuit code. But this is often not possible
(outside guards). Using andalso/orelse let you express that you
know what you are doing.

Having two separate tests is not clearer. Compare e.g.:

	is_string([X | Xs]) ->
	    is_char(X) andalso is_string(Xs);
	is_string([]) ->
	    true.
to:
	is_string([X | Xs]) ->
	    case is_char(X) of
	        true ->
	            is_string(X);
	        false ->
	            false
	    end.
	is_string([]) -> true.

With the explicit test, you have to make sure you get the
true/false cases right (which gets error prone if there is more
than one level) when writing it, and furthermore, someone reading
your code must ask him/herself "what does this (nested) switch
here really implement?"

Chris also wrote:

> more and more I am finding myself disagreeing with the canon law
> that message passing ought to be encapsulated in function calls.
> I agree that it ought to be *encapsulated* for the sake of
> abstraction, but I believe that a function is the wrong thing to
> encapsulate it in! Functions traditionally do not have side
> effects. If I have a piece of Erlang code like
> 
>   Z = some_function(X,Y).
> 
> then I cannot immediately tell if there are side effects
> involved. [...] However, if I have a piece of Erlang code like
> 
>   some_server ! {some_function, self(), [X,Y]},
>   Z = receive
>     Any -> Any
>   end.
> 
> then I can *immediately* tell at a glance that there *are* side
> effects involved,

I agree: there should (in a better world) be another kind of
abstraction for this purpose. The real issue is not that the
send/receive are explicit in the code - because they often ought
to be - but that they expose the data structures used in the
message passing (or to be specific, the 'receive' does). A way to
handle this would be to have abstract patterns (O'Keefe 1998), but
from what I have heard, implementing this idea in an efficient way
turned out to be a lot more difficult than expected.

Hal Snyder asked:

> Which reminds me, I wish Erlang had elsif, or cond. Did anything
> come of http://www.bluetail.com/~rv/Erlang-spec/Proposals/cond.shtml?

This is on its way. In preparation, 'cond' was made a reserved
word in R8, along with 'try'. (Adding a new keyword is a fairly
large language change, so people should be duly warned.)

I hope that about covers it. Thank you for your time.

	/Richard

Richard Carlsson (richardc@REDACTED)   (This space intentionally left blank.)
E-mail: Richard.Carlsson@REDACTED	WWW: http://www.csd.uu.se/~richardc/
 "Having users is like optimization: the wise course is to delay it."
   -- Paul Graham