[erlang-questions] What I dislike about Erlang

Fri Aug 31 16:20:31 CEST 2012

On Fri, Aug 31, 2012 at 8:20 AM, Richard O'Keefe <ok@REDACTED> wrote:
> We've just had a thread about what people like about Erlang.
> We also had the announcement of TinyMQ.
> So I'm going to use this as an example of what's *really*
> wrong with Erlang.
>
> Don't get me wrong.  I endorse everything everyone else has
> said in favour of Erlang.  Erlang is like democracy: the worst
> thing in its class except for all the others, and something
> that is increasingly imitated by people who just don't get
> some of the fundamental things about it.
>
> I also endorse what people have said in praise of TinyMQ.
> There are lots of things that it does right:
>  - there is a README
>  - there are EDoc comments with @specs for the public
>    interface
>  - the functions and variables are named well enough that
>    I was never in doubt about what any part of the code was
>    up to, at least not for longer than a second or two
>  - the hard work of process management is delegated to OTP
>    behaviours
> At this point, it's looking better than anything I've written.
>
> Make no mistake: I am not saying that Erlang or TinyMQ are *bad*.
> They are good things; I'm just ranting somewhat vaguely about
> why they should be better.
>
>
> LUMPS OF INDISTINGUISHABLE CODE.
>
>   Up to a certain level of hand-waving, TinyMQ can be roughly
>   understood thus:
>         The TinyMQ *system* is a monitor
>         guarding a dictionary mapping strings to channnels,
>   where
>         a channel is a monitor
>         guarding a bag of subscribers and
>         a sliding window of {Message, Timestamp} pairs.
>
>   YOU CANNOT SEE THIS AT A GLANCE.
>
>   This is not Evan Miller's fault.  *Anything* you write in
>   Erlang is going to end up as lumps of indistinguishable code,
>   because there is nothing else for it to be.
>
>   This is also true in C, C++, Java, C#, Javascript, Go,
>   Eiffel, Smalltalk, Prolog, Haskell, Clean, SML, ...,
>   not to mention Visual Basic and Fortran.
>
>   Almost the only languages I know where it doesn't *have* to
>   be true are Lisp, Scheme, and Lisp-Flavoured Erlang.  Arguably
>   Prolog *could* be in this group, but in practice it usually is
>   in the other camp.  Thanks to the preprocessor, C *can* be
>   made rather more scrutable, but for some reason this is frowned on.
>
>   There's the e2 project (http://e2project.org) which is a step
>   in a good direction, but it doesn't do much about this problem.
>   A version of TinyMQ using e2_service instead of gen_server
>   would in fact exacerbate the problem by mushing
>   handle_call/3, handle_cast/2, and handle_info/2 into one
>   function, turning three lumps into one bigger lump.
>
> LUMPS OF DATA.
>
>   Take tinymq_channel_controller as an example.
>   Using an OTP behaviour means that all six dimensions of the state
>   are mushed together in one data structure.  This goes a long way
>   towards hiding the fact that
>
>         supervisor, channel, and max_age are never changed
>         messages, subscribers, and last_pull *are* changed.
>
>   One teeny tiny step here would be to offer an alternative set of
>   callbacks for some behaviours where the "state" is separated into
>   immutable "context" and mutable "state", so that it is obvious
>   *by construction* that the context information *can't* be changed.
>
>   Another option would be to have some way of annotation in a
>   -record declaration that a field cannot be updated.
>
>   I prefer the segregation approach on the grounds of no language
>   change being needed and the improved efficiency of not copying
>   fields that can't have changed.  Others might prefer the revise
>   -record approach on the grounds of not having to change or
>   duplicate the OTP behaviours.
>
>   I had to reach each file in detail
>   - to find that certain fields *happened* not to be changed
>   - to understand the design well enough to tell that this was
>     almost certainly deliberate.
>
> WE DOCUMENT THE WRONG THINGS.
>
>   It's well known that there are two kinds of documentation,
>   "external" documentation for people writing clients of a module,
>   and "internal" documentation for people maintaining the module
>   itself.  It's also well known that the division is simplistic;
>   if the external documentation is silent about material points
>   you have to read the internal documentation.

Thank you.

I was wondering - perhaps it is wrong to publish source code - If you
have to read
the internal documentation to understand the external behavior of a program then
the external documentation is not good enough.

Dare you publish only binary code and documentation of the interfaces?

I find that large numbers of programs cannot be understood by reading
the documentation
of the interfaces - you have to read the internal documentation and
(horrors) the code.

Reading code is no fun - since you always wonder *why* they wrote it
that way, and not some
other way and you get tempted to change it.

I think you should only publish binary code and external
documentation. If a user wants
to know how to use the code and have to ask then you have failed to
document your code.

If a user wants to see the code, not because they wish to use it, but
because they wish
to see how you solved the problem - then you can let them see the code.

Programs  and code are supposed to be black-boxes. If you have to open
the black box and
peep inside then they are not black boxes any more.

The practice of reading code to figure out how to use the code is
crazy and an incredible
waste of time.

When programming I spend most of my time fixing things that should not be broken
and figuring out stuff that should be documented.

I have said many times - code is the result of research - It might
take me hours of research
to write twenty lines of code.If I publish the 20 lines and throw away
the research I am doing nobidy
a favor.

Programs should be released with all the necessary documents needed to
understand the code.

/Joe

>
>   In languages like Prolog and Erlang and Scheme where you build
>   data structures out of existing "universal" types and have no
>   data structure declarations, we tend to document procedures
>   but not data.  This is backwards.  If you understand the data,
>   and especially its invariants, the code is often pretty obvious.
>
>   There are two examples of this in TinyMQ.  One is specific to
>   TinyMQ.  The other other is nearly universal in Erlang practice.
>
>   Erlang systems are made of lots of processes sending messages
>   to each other.  Joe Armstrong has often said THINK ABOUT THE
>   PROTOCOLS.  But Erlang programmers very seldom *write* about
>   the protocols.
>
>   Using the OTP behaviours, a "concurrent object" is implemented
>   as a module with a bunch of interface functions that forward
>   messages through the OTP layer to the callback code managed by
>   whatever behaviour it is.  This protocol is unique to each kind
>   of concurrent object.  It's often generated in one module (the
>   one with the interface functions) and consumed in another (the
>   one with the callback code), as it is in TinyMQ.  And it's not
>   documented.
>
>   It is possible to reconstruct this protocol by reading the code
>   in detail and noting down what you see.  It is troublesome when,
>   as in TinyMQ, the two modules disagree about the protocol.  It's
>   clear that _something_ is wrong, but what, exactly?
>
>   For example, tinymq_controller has a case
>         handle_cast({set_max_age, newMaxAge}, State) ->
>   but this is the only occurrence of set_max_age anywhere in TinyMQ.
>   Is its presence in tinymq_controller an example of dead code,
>   or is its absence from the rest of the application an example
>   of missing code?  The same question can be asked about 'expire'
>   (which would forget a channel without making it actually go away,
>    if it could ever be invoked, which it can't.)
>
>   Almost as soon as I started reading Erlang code many years ago
>   it seemed obvious to me that documenting (and if possible, type
>   checking) these internal protocols was a very important part of
>   Erlang internal documentation.  There must be something wrong
>   with my brain, because other people don't seem to feel this lack
>   anywhere nearly as strongly as I do.  I think Joe Armstrong sort
>   of sees this at the next level up or he would never have invented
>   UBF.
>
>   But Occam, Go, and Sing# have typed channels, so they *are*
>   addressing the issue, and *do* have a natural central point to
>   document what the alternatives of an internal protocol signify.
>
>   Another documentation failure is that we fail to document what
>   is not there.  In TinyMQ, a channel automatically comes into
>   existence when you try to use it.  Perhaps as a consequence of
>   this, there is no way to shut a channel down.  In TinyMQ, old
>   messages are not removed from a channel when they expire, but
>   the next time someone does a 'subscribe' (waves hands) or a 'poll'
>   or a 'push' *after* they expire.  So if processes stop sending
>   and requesting messages to some channel, the last few messages,
>   no matter how large, may hang around forever.  I'm sure there
>   is a reason, but because it's a reason for something *not* being
>   there, there's no obvious place to hang the comment, and there
>   isn't one.  (Except for the dead 'expire' clause mentioned above.)
>
> IT'S HARD TO SPOT SALIENT DETAIL IN A SEA OF GLUE CODE.
>
>   The central fact about TinyMQ is that it holds the messages of
>   a channel in a simple list of {Message, Timestamp} pairs.  As
>   a result, every operation on the data takes time linear in the
>   current size.
>
>   This is not stated anywhere in any comments nor in the README.
>   You have to read the code in detail to discover this.  And it
>   is a rather nasty surprise.  If a channel holds N messages,
>   the operations *can* be done in O(log(N)) time.  (I believe it
>   is possible to do even better.)  Some sliding window applications
>   have a bound on the number of elements in the window.  This one
>   has a bound on the age of elements, but they could arrive at a
>   very high rate, so N *could* get large.
>
>   It is very easy to implement the necessary operations using lists,
>   so much so that they are present in several copies.  Revising the
>   TinyMQ implementation to work better with long queues would be
>   harder than necessary because of this.  And this goes un-noticed
>   because there is so much glue code for the guts to get lost in.
>
>   Given that Evan Miller took the trouble to use library components
>   for structuring this application, why didn't he take the next step,
>   and use the existing 'sliding window' library data structure?
>
>         Because there is none!
>
>   Yet sliding windows of one sort or another have come up before in
>   this mailing list.  Perhaps we should have a Wiki page on
>   trapexit to gather requirements for one or more sliding window
>   libraries.  Or perhaps not.  "true religion jeans for women" --
>   what has that or "Cheap Nike Shoes" to do with Erlang/OTP
>   (http://www.trapexit.org/forum/viewforum.php?f=20)?
>
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions