[erlang-questions] What I dislike about Erlang

Fri Aug 31 20:16:15 CEST 2012

This of course wasn't meant to imply that TinyMQ was early or raw!

On Fri, Aug 31, 2012 at 1:14 PM, Garrett Smith <g@REDACTED> wrote:
> It's fun to watch this process. Back in the days when most source code
> closed, you'd never get to see the iterations of maturing code.
>
> Now we write something and then put it on a publicly accessible server
> where people can see it in its earliest, rawest form.
>
> And to get minds like RoK and Joe to weigh in -- it's very special I think.
>
> On Fri, Aug 31, 2012 at 12:42 PM, Evan Miller <emmiller@REDACTED> wrote:
>> Richard,
>>
>> Thanks for your comments. To preface, I plead guilty to charges of
>> gross negligence in failing to document TinyMQ's internals. This was
>> laziness on my part.
>>
>> I released TinyMQ only because I felt guilty for sitting on the code
>> for about a year. Like many open-source programmers, I have a lot of
>> demands on my attention, and it is not clear in advance what
>> documentation is actually worth writing. The @spec and @doc strings
>> for the public API seemed like a good start. But if it turned out that
>> no one was interested in using the library in the first place, why
>> should I bother documenting internal protocols and data structures?
>> I've wasted many hours in the past documenting, refactoring, and
>> generally cleaning up application internals for the benefit of
>> nebulous "others", only to receive zero patches and no indication that
>> any of my efforts were of any assistance to anyone.
>>
>> So in the spirit of your capitalized complaints, I will just say:
>>
>> ALL YOU HAVE TO DO IS ASK
>>
>> Want to know about the big-O performance characteristics? Just ask.
>> Want to know how channel creation works? Just ask. As a lazy person,
>> if a few people ask me the same thing I'll usually add a note to the
>> README in order to avert future emails from strangers. We all like a
>> well-documented project, but without feedback and communication it is
>> not clear where one's efforts are best spent on a project that doesn't
>> have an explicit client. If I knew in advance who would be using and
>> reading the code (i.e. if I wrote this code for an employer), I would
>> put more effort into writing documents for that specific audience. But
>> as a rule, if I am just putting some code "out there", I would rather
>> wait and see what people would like to know about, rather than
>> pre-emptively document every thought that has ever occurred to me
>> relating to the code base.
>>
>> Now, I know you were not trying to pick on TinyMQ, and your interest
>> is more in how Erlang tends to result in lumps of code that obscure
>> key characteristics of the application. I agree with the assessment,
>> but I am not quite as hopeless about the situation.
>>
>> I would like to see the development of graphical tools that let you
>> see in an instant how applications are structured and how they behave.
>> I am thinking of something like Pman on steroids, where I can *watch*
>> messages travel between processes, *inspect* gen_server state, and
>> *test* the system by seeing the result of single function calls or
>> many (load-testing). I'd like to be able to do all this with my mouse,
>> and generally get the feeling that I am watching the operation of a
>> machine that *shows* me how messages are passed, processes are
>> created, and state is updated.
>>
>> Did anyone else ever play Marble Drop from Maxis in the late 90s? That
>> is the kind of interface I would like to see for the Erlang run-time.
>>
>> For now, I'll update the README.
>>
>> Evan
>>
>> On Fri, Aug 31, 2012 at 1:20 AM, Richard O'Keefe <ok@REDACTED> wrote:
>>> We've just had a thread about what people like about Erlang.
>>> We also had the announcement of TinyMQ.
>>> So I'm going to use this as an example of what's *really*
>>> wrong with Erlang.
>>>
>>> Don't get me wrong.  I endorse everything everyone else has
>>> said in favour of Erlang.  Erlang is like democracy: the worst
>>> thing in its class except for all the others, and something
>>> that is increasingly imitated by people who just don't get
>>> some of the fundamental things about it.
>>>
>>> I also endorse what people have said in praise of TinyMQ.
>>> There are lots of things that it does right:
>>>  - there is a README
>>>  - there are EDoc comments with @specs for the public
>>>    interface
>>>  - the functions and variables are named well enough that
>>>    I was never in doubt about what any part of the code was
>>>    up to, at least not for longer than a second or two
>>>  - the hard work of process management is delegated to OTP
>>>    behaviours
>>> At this point, it's looking better than anything I've written.
>>>
>>> Make no mistake: I am not saying that Erlang or TinyMQ are *bad*.
>>> They are good things; I'm just ranting somewhat vaguely about
>>> why they should be better.
>>>
>>>
>>> LUMPS OF INDISTINGUISHABLE CODE.
>>>
>>>   Up to a certain level of hand-waving, TinyMQ can be roughly
>>>   understood thus:
>>>         The TinyMQ *system* is a monitor
>>>         guarding a dictionary mapping strings to channnels,
>>>   where
>>>         a channel is a monitor
>>>         guarding a bag of subscribers and
>>>         a sliding window of {Message, Timestamp} pairs.
>>>
>>>   YOU CANNOT SEE THIS AT A GLANCE.
>>>
>>>   This is not Evan Miller's fault.  *Anything* you write in
>>>   Erlang is going to end up as lumps of indistinguishable code,
>>>   because there is nothing else for it to be.
>>>
>>>   This is also true in C, C++, Java, C#, Javascript, Go,
>>>   Eiffel, Smalltalk, Prolog, Haskell, Clean, SML, ...,
>>>   not to mention Visual Basic and Fortran.
>>>
>>>   Almost the only languages I know where it doesn't *have* to
>>>   be true are Lisp, Scheme, and Lisp-Flavoured Erlang.  Arguably
>>>   Prolog *could* be in this group, but in practice it usually is
>>>   in the other camp.  Thanks to the preprocessor, C *can* be
>>>   made rather more scrutable, but for some reason this is frowned on.
>>>
>>>   There's the e2 project (http://e2project.org) which is a step
>>>   in a good direction, but it doesn't do much about this problem.
>>>   A version of TinyMQ using e2_service instead of gen_server
>>>   would in fact exacerbate the problem by mushing
>>>   handle_call/3, handle_cast/2, and handle_info/2 into one
>>>   function, turning three lumps into one bigger lump.
>>>
>>> LUMPS OF DATA.
>>>
>>>   Take tinymq_channel_controller as an example.
>>>   Using an OTP behaviour means that all six dimensions of the state
>>>   are mushed together in one data structure.  This goes a long way
>>>   towards hiding the fact that
>>>
>>>         supervisor, channel, and max_age are never changed
>>>         messages, subscribers, and last_pull *are* changed.
>>>
>>>   One teeny tiny step here would be to offer an alternative set of
>>>   callbacks for some behaviours where the "state" is separated into
>>>   immutable "context" and mutable "state", so that it is obvious
>>>   *by construction* that the context information *can't* be changed.
>>>
>>>   Another option would be to have some way of annotation in a
>>>   -record declaration that a field cannot be updated.
>>>
>>>   I prefer the segregation approach on the grounds of no language
>>>   change being needed and the improved efficiency of not copying
>>>   fields that can't have changed.  Others might prefer the revise
>>>   -record approach on the grounds of not having to change or
>>>   duplicate the OTP behaviours.
>>>
>>>   I had to reach each file in detail
>>>   - to find that certain fields *happened* not to be changed
>>>   - to understand the design well enough to tell that this was
>>>     almost certainly deliberate.
>>>
>>> WE DOCUMENT THE WRONG THINGS.
>>>
>>>   It's well known that there are two kinds of documentation,
>>>   "external" documentation for people writing clients of a module,
>>>   and "internal" documentation for people maintaining the module
>>>   itself.  It's also well known that the division is simplistic;
>>>   if the external documentation is silent about material points
>>>   you have to read the internal documentation.
>>>
>>>   In languages like Prolog and Erlang and Scheme where you build
>>>   data structures out of existing "universal" types and have no
>>>   data structure declarations, we tend to document procedures
>>>   but not data.  This is backwards.  If you understand the data,
>>>   and especially its invariants, the code is often pretty obvious.
>>>
>>>   There are two examples of this in TinyMQ.  One is specific to
>>>   TinyMQ.  The other other is nearly universal in Erlang practice.
>>>
>>>   Erlang systems are made of lots of processes sending messages
>>>   to each other.  Joe Armstrong has often said THINK ABOUT THE
>>>   PROTOCOLS.  But Erlang programmers very seldom *write* about
>>>   the protocols.
>>>
>>>   Using the OTP behaviours, a "concurrent object" is implemented
>>>   as a module with a bunch of interface functions that forward
>>>   messages through the OTP layer to the callback code managed by
>>>   whatever behaviour it is.  This protocol is unique to each kind
>>>   of concurrent object.  It's often generated in one module (the
>>>   one with the interface functions) and consumed in another (the
>>>   one with the callback code), as it is in TinyMQ.  And it's not
>>>   documented.
>>>
>>>   It is possible to reconstruct this protocol by reading the code
>>>   in detail and noting down what you see.  It is troublesome when,
>>>   as in TinyMQ, the two modules disagree about the protocol.  It's
>>>   clear that _something_ is wrong, but what, exactly?
>>>
>>>   For example, tinymq_controller has a case
>>>         handle_cast({set_max_age, newMaxAge}, State) ->
>>>   but this is the only occurrence of set_max_age anywhere in TinyMQ.
>>>   Is its presence in tinymq_controller an example of dead code,
>>>   or is its absence from the rest of the application an example
>>>   of missing code?  The same question can be asked about 'expire'
>>>   (which would forget a channel without making it actually go away,
>>>    if it could ever be invoked, which it can't.)
>>>
>>>   Almost as soon as I started reading Erlang code many years ago
>>>   it seemed obvious to me that documenting (and if possible, type
>>>   checking) these internal protocols was a very important part of
>>>   Erlang internal documentation.  There must be something wrong
>>>   with my brain, because other people don't seem to feel this lack
>>>   anywhere nearly as strongly as I do.  I think Joe Armstrong sort
>>>   of sees this at the next level up or he would never have invented
>>>   UBF.
>>>
>>>   But Occam, Go, and Sing# have typed channels, so they *are*
>>>   addressing the issue, and *do* have a natural central point to
>>>   document what the alternatives of an internal protocol signify.
>>>
>>>   Another documentation failure is that we fail to document what
>>>   is not there.  In TinyMQ, a channel automatically comes into
>>>   existence when you try to use it.  Perhaps as a consequence of
>>>   this, there is no way to shut a channel down.  In TinyMQ, old
>>>   messages are not removed from a channel when they expire, but
>>>   the next time someone does a 'subscribe' (waves hands) or a 'poll'
>>>   or a 'push' *after* they expire.  So if processes stop sending
>>>   and requesting messages to some channel, the last few messages,
>>>   no matter how large, may hang around forever.  I'm sure there
>>>   is a reason, but because it's a reason for something *not* being
>>>   there, there's no obvious place to hang the comment, and there
>>>   isn't one.  (Except for the dead 'expire' clause mentioned above.)
>>>
>>> IT'S HARD TO SPOT SALIENT DETAIL IN A SEA OF GLUE CODE.
>>>
>>>   The central fact about TinyMQ is that it holds the messages of
>>>   a channel in a simple list of {Message, Timestamp} pairs.  As
>>>   a result, every operation on the data takes time linear in the
>>>   current size.
>>>
>>>   This is not stated anywhere in any comments nor in the README.
>>>   You have to read the code in detail to discover this.  And it
>>>   is a rather nasty surprise.  If a channel holds N messages,
>>>   the operations *can* be done in O(log(N)) time.  (I believe it
>>>   is possible to do even better.)  Some sliding window applications
>>>   have a bound on the number of elements in the window.  This one
>>>   has a bound on the age of elements, but they could arrive at a
>>>   very high rate, so N *could* get large.
>>>
>>>   It is very easy to implement the necessary operations using lists,
>>>   so much so that they are present in several copies.  Revising the
>>>   TinyMQ implementation to work better with long queues would be
>>>   harder than necessary because of this.  And this goes un-noticed
>>>   because there is so much glue code for the guts to get lost in.
>>>
>>>   Given that Evan Miller took the trouble to use library components
>>>   for structuring this application, why didn't he take the next step,
>>>   and use the existing 'sliding window' library data structure?
>>>
>>>         Because there is none!
>>>
>>>   Yet sliding windows of one sort or another have come up before in
>>>   this mailing list.  Perhaps we should have a Wiki page on
>>>   trapexit to gather requirements for one or more sliding window
>>>   libraries.  Or perhaps not.  "true religion jeans for women" --
>>>   what has that or "Cheap Nike Shoes" to do with Erlang/OTP
>>>   (http://www.trapexit.org/forum/viewforum.php?f=20)?
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>> --
>> Evan Miller
>> http://www.evanmiller.org/
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions