[proposal] Declarative syntax for metadata (long!)

Thu Mar 18 12:34:08 CET 2010

Hi all,

I feel I should follow up on my rant from last night (I hope the tone
wasn't too harsh!). I have been thinking about these things for a long
while, but I'm sure I still miss some points.

Erlang source code contains besides actual code even meta-data and
configuration data, disguised as Erlang terms, comments or strings.
Some of this data has been integrated into the language (type
specifications, for example), but not all of it. I would like to
present arguments for and against integrating all such data.

First a non-exhaustive list of data that IMHO would benefit from
becoming a first-class citizen of the language:
- behaviour callback specifications
- supervisor child specifications
- match specifications
- edoc

Today, there are three ways to encode this data: as Erlang terms, as
strings and as comments. (I know strings are terms, but they are
different)

Comments are used to store structured documentation info (edoc) and
are the weakest case for "uplift" to citizenship. The argument for it
is that there is a mini-language involved anyway, so almost everything
is already in place. I think that there is a very simple way to go all
the way: introduce block comments. The main problem for me is those
pesky "%%" when reformatting the documentation. It would be nice if
there were special edoc comments, so that it's easy for a tool to tell
them apart from regular comments.

Erlang terms are a very flexible representation that works fine at the
lowest level, but I argue that programmers shouldn't be forced to
think at that level. Using Erlang terms is in fact forcing the
programmer to do the work of a parser and convert a high-level
declaration into a bunch of terms with complex structure (thus easy to
get wrong). If integrated in the language, the parser and compiler
would be able to detect errors and inconsistencies that otherwise
would result in run-time bugs.

Strings are a special case because they can contain the high-level
declarations I mentioned above and thus are easier to reason about,
but they are still not properly parsed and any error will reveal
itself at run-time.

Some of the data encoded as terms is already partially integrated:
some match specifications can be written using the fun_ms parse
transform. What I argue for here is going all the way and provide this
kind of support for all of them and for the other data mentioned here,
hopefully without having to clutter all files with parse_transform
declarations.

A closely related issue is that some of this data is declarative but
returned by magic functions. I can't see any use case (except for
match specifications) where these specifications need to be generated
dynamically, so the data could just as well be provided by an
attribute. The compiler can generate the magic functions if needed, or
(better IMHO) we could provide a better API to module attributes. If
using attributes, the advantage is that we can use specific
mini-languages that fit better the domain, because we're not limited
to Erlang expressions.

<exploratory_mode on>
By emphasizing declarative features in the language we can start
considering other things that can be handled in a similar way, thereby
moving on a higher abstraction level. Like Mikael mentioned too,
contract declarations (UBF or other kinds) come to mind easily. We can
extend supervisor child specs to a description of all processes in an
application or even the whole system.

Of course, this can already be done today. The problem is that without
a dedicated language, the declarations end up very difficult to read,
to reason about and to debug. In some cases they can even be more
verbose than the Erlang code that would achieve the same result.
Working on a higher level gives better understanding

Now, going even further into the future: if this will turn out to work
as well as I hope it will, making everybody twice as productive and
twice as happy, it might happen that more and more applications and
tools will see benefits from going the same way, but for non-OTP
applications it won't be possible to integrate "their" declaration
mini-language. I see two ways to handle this, both being definitely
not something one could throw together over a weekend:
  - allow application-defined parsers to be called on parts of the
source code. This could provide even cooler functionality, but I'm not
going to dig into that right now :)
  - define a single declaration language that can be extended with
user code and with extensible syntax. By that I mean something in the
spirit of Ruby, where it is extremely easy to write domain-specific
languages that are at the same correct Ruby programs.
</>

== Conclusion ==

The part in the beginning is something that I think is useful,
relatively easy to specify and implement and without too many
compatibility issues. Is there anybody sharing this opinion? Is it EEP
material? Is even the exploratory_mode part worth detailing right now?
(so that when 5 or 10 years from now someone wants to implement it,
he/she doesn't discover it requires rewriting everything).

Another way to put the question is: what parts (if any) should become
part of the Erlang language, and what parts (if any) fit better in an
Erlang-based language?

I feel that it is important to sometimes raise one's eyes and try to
get a glimpse of what the future holds and even to try to shape it.

best regards,
Vlad