[erlang-questions] What I dislike about Erlang
Garrett Smith
g@REDACTED
Fri Aug 31 20:16:15 CEST 2012
This of course wasn't meant to imply that TinyMQ was early or raw!
On Fri, Aug 31, 2012 at 1:14 PM, Garrett Smith <g@REDACTED> wrote:
> It's fun to watch this process. Back in the days when most source code
> closed, you'd never get to see the iterations of maturing code.
>
> Now we write something and then put it on a publicly accessible server
> where people can see it in its earliest, rawest form.
>
> And to get minds like RoK and Joe to weigh in -- it's very special I think.
>
> On Fri, Aug 31, 2012 at 12:42 PM, Evan Miller <emmiller@REDACTED> wrote:
>> Richard,
>>
>> Thanks for your comments. To preface, I plead guilty to charges of
>> gross negligence in failing to document TinyMQ's internals. This was
>> laziness on my part.
>>
>> I released TinyMQ only because I felt guilty for sitting on the code
>> for about a year. Like many open-source programmers, I have a lot of
>> demands on my attention, and it is not clear in advance what
>> documentation is actually worth writing. The @spec and @doc strings
>> for the public API seemed like a good start. But if it turned out that
>> no one was interested in using the library in the first place, why
>> should I bother documenting internal protocols and data structures?
>> I've wasted many hours in the past documenting, refactoring, and
>> generally cleaning up application internals for the benefit of
>> nebulous "others", only to receive zero patches and no indication that
>> any of my efforts were of any assistance to anyone.
>>
>> So in the spirit of your capitalized complaints, I will just say:
>>
>> ALL YOU HAVE TO DO IS ASK
>>
>> Want to know about the big-O performance characteristics? Just ask.
>> Want to know how channel creation works? Just ask. As a lazy person,
>> if a few people ask me the same thing I'll usually add a note to the
>> README in order to avert future emails from strangers. We all like a
>> well-documented project, but without feedback and communication it is
>> not clear where one's efforts are best spent on a project that doesn't
>> have an explicit client. If I knew in advance who would be using and
>> reading the code (i.e. if I wrote this code for an employer), I would
>> put more effort into writing documents for that specific audience. But
>> as a rule, if I am just putting some code "out there", I would rather
>> wait and see what people would like to know about, rather than
>> pre-emptively document every thought that has ever occurred to me
>> relating to the code base.
>>
>> Now, I know you were not trying to pick on TinyMQ, and your interest
>> is more in how Erlang tends to result in lumps of code that obscure
>> key characteristics of the application. I agree with the assessment,
>> but I am not quite as hopeless about the situation.
>>
>> I would like to see the development of graphical tools that let you
>> see in an instant how applications are structured and how they behave.
>> I am thinking of something like Pman on steroids, where I can *watch*
>> messages travel between processes, *inspect* gen_server state, and
>> *test* the system by seeing the result of single function calls or
>> many (load-testing). I'd like to be able to do all this with my mouse,
>> and generally get the feeling that I am watching the operation of a
>> machine that *shows* me how messages are passed, processes are
>> created, and state is updated.
>>
>> Did anyone else ever play Marble Drop from Maxis in the late 90s? That
>> is the kind of interface I would like to see for the Erlang run-time.
>>
>> For now, I'll update the README.
>>
>> Evan
>>
>> On Fri, Aug 31, 2012 at 1:20 AM, Richard O'Keefe <ok@REDACTED> wrote:
>>> We've just had a thread about what people like about Erlang.
>>> We also had the announcement of TinyMQ.
>>> So I'm going to use this as an example of what's *really*
>>> wrong with Erlang.
>>>
>>> Don't get me wrong. I endorse everything everyone else has
>>> said in favour of Erlang. Erlang is like democracy: the worst
>>> thing in its class except for all the others, and something
>>> that is increasingly imitated by people who just don't get
>>> some of the fundamental things about it.
>>>
>>> I also endorse what people have said in praise of TinyMQ.
>>> There are lots of things that it does right:
>>> - there is a README
>>> - there are EDoc comments with @specs for the public
>>> interface
>>> - the functions and variables are named well enough that
>>> I was never in doubt about what any part of the code was
>>> up to, at least not for longer than a second or two
>>> - the hard work of process management is delegated to OTP
>>> behaviours
>>> At this point, it's looking better than anything I've written.
>>>
>>> Make no mistake: I am not saying that Erlang or TinyMQ are *bad*.
>>> They are good things; I'm just ranting somewhat vaguely about
>>> why they should be better.
>>>
>>>
>>> LUMPS OF INDISTINGUISHABLE CODE.
>>>
>>> Up to a certain level of hand-waving, TinyMQ can be roughly
>>> understood thus:
>>> The TinyMQ *system* is a monitor
>>> guarding a dictionary mapping strings to channnels,
>>> where
>>> a channel is a monitor
>>> guarding a bag of subscribers and
>>> a sliding window of {Message, Timestamp} pairs.
>>>
>>> YOU CANNOT SEE THIS AT A GLANCE.
>>>
>>> This is not Evan Miller's fault. *Anything* you write in
>>> Erlang is going to end up as lumps of indistinguishable code,
>>> because there is nothing else for it to be.
>>>
>>> This is also true in C, C++, Java, C#, Javascript, Go,
>>> Eiffel, Smalltalk, Prolog, Haskell, Clean, SML, ...,
>>> not to mention Visual Basic and Fortran.
>>>
>>> Almost the only languages I know where it doesn't *have* to
>>> be true are Lisp, Scheme, and Lisp-Flavoured Erlang. Arguably
>>> Prolog *could* be in this group, but in practice it usually is
>>> in the other camp. Thanks to the preprocessor, C *can* be
>>> made rather more scrutable, but for some reason this is frowned on.
>>>
>>> There's the e2 project (http://e2project.org) which is a step
>>> in a good direction, but it doesn't do much about this problem.
>>> A version of TinyMQ using e2_service instead of gen_server
>>> would in fact exacerbate the problem by mushing
>>> handle_call/3, handle_cast/2, and handle_info/2 into one
>>> function, turning three lumps into one bigger lump.
>>>
>>> LUMPS OF DATA.
>>>
>>> Take tinymq_channel_controller as an example.
>>> Using an OTP behaviour means that all six dimensions of the state
>>> are mushed together in one data structure. This goes a long way
>>> towards hiding the fact that
>>>
>>> supervisor, channel, and max_age are never changed
>>> messages, subscribers, and last_pull *are* changed.
>>>
>>> One teeny tiny step here would be to offer an alternative set of
>>> callbacks for some behaviours where the "state" is separated into
>>> immutable "context" and mutable "state", so that it is obvious
>>> *by construction* that the context information *can't* be changed.
>>>
>>> Another option would be to have some way of annotation in a
>>> -record declaration that a field cannot be updated.
>>>
>>> I prefer the segregation approach on the grounds of no language
>>> change being needed and the improved efficiency of not copying
>>> fields that can't have changed. Others might prefer the revise
>>> -record approach on the grounds of not having to change or
>>> duplicate the OTP behaviours.
>>>
>>> I had to reach each file in detail
>>> - to find that certain fields *happened* not to be changed
>>> - to understand the design well enough to tell that this was
>>> almost certainly deliberate.
>>>
>>> WE DOCUMENT THE WRONG THINGS.
>>>
>>> It's well known that there are two kinds of documentation,
>>> "external" documentation for people writing clients of a module,
>>> and "internal" documentation for people maintaining the module
>>> itself. It's also well known that the division is simplistic;
>>> if the external documentation is silent about material points
>>> you have to read the internal documentation.
>>>
>>> In languages like Prolog and Erlang and Scheme where you build
>>> data structures out of existing "universal" types and have no
>>> data structure declarations, we tend to document procedures
>>> but not data. This is backwards. If you understand the data,
>>> and especially its invariants, the code is often pretty obvious.
>>>
>>> There are two examples of this in TinyMQ. One is specific to
>>> TinyMQ. The other other is nearly universal in Erlang practice.
>>>
>>> Erlang systems are made of lots of processes sending messages
>>> to each other. Joe Armstrong has often said THINK ABOUT THE
>>> PROTOCOLS. But Erlang programmers very seldom *write* about
>>> the protocols.
>>>
>>> Using the OTP behaviours, a "concurrent object" is implemented
>>> as a module with a bunch of interface functions that forward
>>> messages through the OTP layer to the callback code managed by
>>> whatever behaviour it is. This protocol is unique to each kind
>>> of concurrent object. It's often generated in one module (the
>>> one with the interface functions) and consumed in another (the
>>> one with the callback code), as it is in TinyMQ. And it's not
>>> documented.
>>>
>>> It is possible to reconstruct this protocol by reading the code
>>> in detail and noting down what you see. It is troublesome when,
>>> as in TinyMQ, the two modules disagree about the protocol. It's
>>> clear that _something_ is wrong, but what, exactly?
>>>
>>> For example, tinymq_controller has a case
>>> handle_cast({set_max_age, newMaxAge}, State) ->
>>> but this is the only occurrence of set_max_age anywhere in TinyMQ.
>>> Is its presence in tinymq_controller an example of dead code,
>>> or is its absence from the rest of the application an example
>>> of missing code? The same question can be asked about 'expire'
>>> (which would forget a channel without making it actually go away,
>>> if it could ever be invoked, which it can't.)
>>>
>>> Almost as soon as I started reading Erlang code many years ago
>>> it seemed obvious to me that documenting (and if possible, type
>>> checking) these internal protocols was a very important part of
>>> Erlang internal documentation. There must be something wrong
>>> with my brain, because other people don't seem to feel this lack
>>> anywhere nearly as strongly as I do. I think Joe Armstrong sort
>>> of sees this at the next level up or he would never have invented
>>> UBF.
>>>
>>> But Occam, Go, and Sing# have typed channels, so they *are*
>>> addressing the issue, and *do* have a natural central point to
>>> document what the alternatives of an internal protocol signify.
>>>
>>> Another documentation failure is that we fail to document what
>>> is not there. In TinyMQ, a channel automatically comes into
>>> existence when you try to use it. Perhaps as a consequence of
>>> this, there is no way to shut a channel down. In TinyMQ, old
>>> messages are not removed from a channel when they expire, but
>>> the next time someone does a 'subscribe' (waves hands) or a 'poll'
>>> or a 'push' *after* they expire. So if processes stop sending
>>> and requesting messages to some channel, the last few messages,
>>> no matter how large, may hang around forever. I'm sure there
>>> is a reason, but because it's a reason for something *not* being
>>> there, there's no obvious place to hang the comment, and there
>>> isn't one. (Except for the dead 'expire' clause mentioned above.)
>>>
>>> IT'S HARD TO SPOT SALIENT DETAIL IN A SEA OF GLUE CODE.
>>>
>>> The central fact about TinyMQ is that it holds the messages of
>>> a channel in a simple list of {Message, Timestamp} pairs. As
>>> a result, every operation on the data takes time linear in the
>>> current size.
>>>
>>> This is not stated anywhere in any comments nor in the README.
>>> You have to read the code in detail to discover this. And it
>>> is a rather nasty surprise. If a channel holds N messages,
>>> the operations *can* be done in O(log(N)) time. (I believe it
>>> is possible to do even better.) Some sliding window applications
>>> have a bound on the number of elements in the window. This one
>>> has a bound on the age of elements, but they could arrive at a
>>> very high rate, so N *could* get large.
>>>
>>> It is very easy to implement the necessary operations using lists,
>>> so much so that they are present in several copies. Revising the
>>> TinyMQ implementation to work better with long queues would be
>>> harder than necessary because of this. And this goes un-noticed
>>> because there is so much glue code for the guts to get lost in.
>>>
>>> Given that Evan Miller took the trouble to use library components
>>> for structuring this application, why didn't he take the next step,
>>> and use the existing 'sliding window' library data structure?
>>>
>>> Because there is none!
>>>
>>> Yet sliding windows of one sort or another have come up before in
>>> this mailing list. Perhaps we should have a Wiki page on
>>> trapexit to gather requirements for one or more sliding window
>>> libraries. Or perhaps not. "true religion jeans for women" --
>>> what has that or "Cheap Nike Shoes" to do with Erlang/OTP
>>> (http://www.trapexit.org/forum/viewforum.php?f=20)?
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>> --
>> Evan Miller
>> http://www.evanmiller.org/
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
More information about the erlang-questions
mailing list