My biggest gripe with erlang are the limitations of records. Anyone know when frames will make an appearance?<div><br></div><div><br></div><div>Sergej<br><br><div class="gmail_quote">On Fri, Aug 31, 2012 at 8:20 AM, Richard O'Keefe <span dir="ltr"><<a href="mailto:ok@cs.otago.ac.nz" target="_blank">ok@cs.otago.ac.nz</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">We've just had a thread about what people like about Erlang.<br>

We also had the announcement of TinyMQ.<br>

So I'm going to use this as an example of what's *really*<br>

wrong with Erlang.<br>

<br>

Don't get me wrong.  I endorse everything everyone else has<br>

said in favour of Erlang.  Erlang is like democracy: the worst<br>

thing in its class except for all the others, and something<br>

that is increasingly imitated by people who just don't get<br>

some of the fundamental things about it.<br>

<br>

I also endorse what people have said in praise of TinyMQ.<br>

There are lots of things that it does right:<br>

 - there is a README<br>

 - there are EDoc comments with @specs for the public<br>

   interface<br>

 - the functions and variables are named well enough that<br>

   I was never in doubt about what any part of the code was<br>

   up to, at least not for longer than a second or two<br>

 - the hard work of process management is delegated to OTP<br>

   behaviours<br>

At this point, it's looking better than anything I've written.<br>

<br>

Make no mistake: I am not saying that Erlang or TinyMQ are *bad*.<br>

They are good things; I'm just ranting somewhat vaguely about<br>

why they should be better.<br>

<br>

<br>

LUMPS OF INDISTINGUISHABLE CODE.<br>

<br>

  Up to a certain level of hand-waving, TinyMQ can be roughly<br>

  understood thus:<br>

        The TinyMQ *system* is a monitor<br>

        guarding a dictionary mapping strings to channnels,<br>

  where<br>

        a channel is a monitor<br>

        guarding a bag of subscribers and<br>

        a sliding window of {Message, Timestamp} pairs.<br>

<br>

  YOU CANNOT SEE THIS AT A GLANCE.<br>

<br>

  This is not Evan Miller's fault.  *Anything* you write in<br>

  Erlang is going to end up as lumps of indistinguishable code,<br>

  because there is nothing else for it to be.<br>

<br>

  This is also true in C, C++, Java, C#, Javascript, Go,<br>

  Eiffel, Smalltalk, Prolog, Haskell, Clean, SML, ...,<br>

  not to mention Visual Basic and Fortran.<br>

<br>

  Almost the only languages I know where it doesn't *have* to<br>

  be true are Lisp, Scheme, and Lisp-Flavoured Erlang.  Arguably<br>

  Prolog *could* be in this group, but in practice it usually is<br>

  in the other camp.  Thanks to the preprocessor, C *can* be<br>

  made rather more scrutable, but for some reason this is frowned on.<br>

<br>

  There's the e2 project (<a href="http://e2project.org" target="_blank">http://e2project.org</a>) which is a step<br>

  in a good direction, but it doesn't do much about this problem.<br>

  A version of TinyMQ using e2_service instead of gen_server<br>

  would in fact exacerbate the problem by mushing<br>

  handle_call/3, handle_cast/2, and handle_info/2 into one<br>

  function, turning three lumps into one bigger lump.<br>

<br>

LUMPS OF DATA.<br>

<br>

  Take tinymq_channel_controller as an example.<br>

  Using an OTP behaviour means that all six dimensions of the state<br>

  are mushed together in one data structure.  This goes a long way<br>

  towards hiding the fact that<br>

<br>

        supervisor, channel, and max_age are never changed<br>

        messages, subscribers, and last_pull *are* changed.<br>

<br>

  One teeny tiny step here would be to offer an alternative set of<br>

  callbacks for some behaviours where the "state" is separated into<br>

  immutable "context" and mutable "state", so that it is obvious<br>

  *by construction* that the context information *can't* be changed.<br>

<br>

  Another option would be to have some way of annotation in a<br>

  -record declaration that a field cannot be updated.<br>

<br>

  I prefer the segregation approach on the grounds of no language<br>

  change being needed and the improved efficiency of not copying<br>

  fields that can't have changed.  Others might prefer the revise<br>

  -record approach on the grounds of not having to change or<br>

  duplicate the OTP behaviours.<br>

<br>

  I had to reach each file in detail<br>

  - to find that certain fields *happened* not to be changed<br>

  - to understand the design well enough to tell that this was<br>

    almost certainly deliberate.<br>

<br>

WE DOCUMENT THE WRONG THINGS.<br>

<br>

  It's well known that there are two kinds of documentation,<br>

  "external" documentation for people writing clients of a module,<br>

  and "internal" documentation for people maintaining the module<br>

  itself.  It's also well known that the division is simplistic;<br>

  if the external documentation is silent about material points<br>

  you have to read the internal documentation.<br>

<br>

  In languages like Prolog and Erlang and Scheme where you build<br>

  data structures out of existing "universal" types and have no<br>

  data structure declarations, we tend to document procedures<br>

  but not data.  This is backwards.  If you understand the data,<br>

  and especially its invariants, the code is often pretty obvious.<br>

<br>

  There are two examples of this in TinyMQ.  One is specific to<br>

  TinyMQ.  The other other is nearly universal in Erlang practice.<br>

<br>

  Erlang systems are made of lots of processes sending messages<br>

  to each other.  Joe Armstrong has often said THINK ABOUT THE<br>

  PROTOCOLS.  But Erlang programmers very seldom *write* about<br>

  the protocols.<br>

<br>

  Using the OTP behaviours, a "concurrent object" is implemented<br>

  as a module with a bunch of interface functions that forward<br>

  messages through the OTP layer to the callback code managed by<br>

  whatever behaviour it is.  This protocol is unique to each kind<br>

  of concurrent object.  It's often generated in one module (the<br>

  one with the interface functions) and consumed in another (the<br>

  one with the callback code), as it is in TinyMQ.  And it's not<br>

  documented.<br>

<br>

  It is possible to reconstruct this protocol by reading the code<br>

  in detail and noting down what you see.  It is troublesome when,<br>

  as in TinyMQ, the two modules disagree about the protocol.  It's<br>

  clear that _something_ is wrong, but what, exactly?<br>

<br>

  For example, tinymq_controller has a case<br>

        handle_cast({set_max_age, newMaxAge}, State) -><br>

  but this is the only occurrence of set_max_age anywhere in TinyMQ.<br>

  Is its presence in tinymq_controller an example of dead code,<br>

  or is its absence from the rest of the application an example<br>

  of missing code?  The same question can be asked about 'expire'<br>

  (which would forget a channel without making it actually go away,<br>

   if it could ever be invoked, which it can't.)<br>

<br>

  Almost as soon as I started reading Erlang code many years ago<br>

  it seemed obvious to me that documenting (and if possible, type<br>

  checking) these internal protocols was a very important part of<br>

  Erlang internal documentation.  There must be something wrong<br>

  with my brain, because other people don't seem to feel this lack<br>

  anywhere nearly as strongly as I do.  I think Joe Armstrong sort<br>

  of sees this at the next level up or he would never have invented<br>

  UBF.<br>

<br>

  But Occam, Go, and Sing# have typed channels, so they *are*<br>

  addressing the issue, and *do* have a natural central point to<br>

  document what the alternatives of an internal protocol signify.<br>

<br>

  Another documentation failure is that we fail to document what<br>

  is not there.  In TinyMQ, a channel automatically comes into<br>

  existence when you try to use it.  Perhaps as a consequence of<br>

  this, there is no way to shut a channel down.  In TinyMQ, old<br>

  messages are not removed from a channel when they expire, but<br>

  the next time someone does a 'subscribe' (waves hands) or a 'poll'<br>

  or a 'push' *after* they expire.  So if processes stop sending<br>

  and requesting messages to some channel, the last few messages,<br>

  no matter how large, may hang around forever.  I'm sure there<br>

  is a reason, but because it's a reason for something *not* being<br>

  there, there's no obvious place to hang the comment, and there<br>

  isn't one.  (Except for the dead 'expire' clause mentioned above.)<br>

<br>

IT'S HARD TO SPOT SALIENT DETAIL IN A SEA OF GLUE CODE.<br>

<br>

  The central fact about TinyMQ is that it holds the messages of<br>

  a channel in a simple list of {Message, Timestamp} pairs.  As<br>

  a result, every operation on the data takes time linear in the<br>

  current size.<br>

<br>

  This is not stated anywhere in any comments nor in the README.<br>

  You have to read the code in detail to discover this.  And it<br>

  is a rather nasty surprise.  If a channel holds N messages,<br>

  the operations *can* be done in O(log(N)) time.  (I believe it<br>

  is possible to do even better.)  Some sliding window applications<br>

  have a bound on the number of elements in the window.  This one<br>

  has a bound on the age of elements, but they could arrive at a<br>

  very high rate, so N *could* get large.<br>

<br>

  It is very easy to implement the necessary operations using lists,<br>

  so much so that they are present in several copies.  Revising the<br>

  TinyMQ implementation to work better with long queues would be<br>

  harder than necessary because of this.  And this goes un-noticed<br>

  because there is so much glue code for the guts to get lost in.<br>

<br>

  Given that Evan Miller took the trouble to use library components<br>

  for structuring this application, why didn't he take the next step,<br>

  and use the existing 'sliding window' library data structure?<br>

<br>

        Because there is none!<br>

<br>

  Yet sliding windows of one sort or another have come up before in<br>

  this mailing list.  Perhaps we should have a Wiki page on<br>

  trapexit to gather requirements for one or more sliding window<br>

  libraries.  Or perhaps not.  "true religion jeans for women" --<br>

  what has that or "Cheap Nike Shoes" to do with Erlang/OTP<br>

  (<a href="http://www.trapexit.org/forum/viewforum.php?f=20" target="_blank">http://www.trapexit.org/forum/viewforum.php?f=20</a>)?<br>

<br>

<br>

<br>

<br>

<br>

_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

</blockquote></div><br></div>