[erlang-questions] Erlang documentation -- a modest proposal

Tue Sep 27 08:59:58 CEST 2016

Greetings,

My manager uses mentimeter.com to poll us (the unit) during unit meetings.

bengt

On 09/27/2016 08:51 AM, Joe Armstrong wrote:
> Best of lists
>
> Can we run some quick polls?
>
> I'd like to ask as many users as possible
>
>      - the best modules to learn
>      - the best functions (per module)
>
> We could set up a poll and vote for our favorite modules
>
> I'd also like to ask the experience level of the responder
>
> Then we could concentrate our documentation efforts on the most
> popular functions (or is there a better way to discover the best-of
> list?)
>
> Is there a good web thingummyjig for this?
>
> /Joe
>
> On Tue, Sep 27, 2016 at 8:26 AM, Joe Armstrong <erlang@REDACTED> wrote:
>> On Mon, Sep 26, 2016 at 10:51 PM, Lukas Larsson <lukas@REDACTED> wrote:
>>> Hello Joe,
>>>
>>> What you are talking about is about lowering the cost of entry when doing
>>> documentation updates. This is a great effort and one which we should really
>>> pursue, but as you say it is a lot of work. Is there really a *need* for
>>> that to be in place first? Can't there be lots of smaller parallel efforts
>>> (we as Erlang programmers should be familiar with the concept) that improve
>>> the content at the same time?
>>>
>>> Since I'm preaching action rather than discussions, it would hypocritical of
>>> me to not do anything so I've submitted a PR with a documentation change
>>> that I believe will make binary_to_term/1 a little clearer for anyone
>>> wondering how to use it for the first time:
>>> https://github.com/erlang/otp/pull/1181
>> Thank you Lukas.
>>
>> This patch made my head explode.
>>
>> How can an example of binary_to_term NOT mention term_to_binary?
>>
>> When I lecture I say that that the PAIR of functions term_to_binary
>> and binary_to_term are FANTASTIC - and I hang my head in shame
>> if having read one of my books this fact was not made abundantly clear.
>>
>> NOT ALL FUNCTIONS ARE EQUALLY IMPORTANT
>>
>> The Erlang manual page is *gigantic* I for one, certainly don't know all
>> the functions in the Erlang manual page - I also only use a handful of
>> functions in file, dict and lists
>>
>> It occurs to me that I probably only use 5% of the functions in file.erl
>> 8% of the functions in dict.erl and so on. I don't know the numbers here,
>> so I'm guessing.
>>
>> Since I can manage perfectly well
>> knowing only 5% of the functions in file, then I'd like to ask
>>
>>     - do we need all the other functions?
>>
>>       Answer: Yes - on rare occasions I need a specific functionality, so I have
>>       search the manual page and read the text very carefully
>>
>>     - do beginners know which functions in (say file) are what I consider
>>       to be the important ones?
>>
>>       Answer: No - how could they
>>
>> So here's an idea .. we (or I in this case) write a 'best of' document
>> The best of should have:
>>
>>       - the *best* modules (which modules do I use most)
>>       - the *best functions* (which functions (per module) do I use most)
>>
>> I think I "know" that answers without analysis.
>>
>> So, for example, in dict.erl I'd hazard I guess that I use new/1,
>> store/3, find/2, from_list/1, to_list/2.
>>
>> For me these functions are in my 'working set' of functions. ie the
>> functions that I use a lot and are in my memory.
>>
>> It seems to me that it might be beneficial to make some special
>> 'best of' documentation that only documents the essential core functions in
>> the libraries.
>>
>> I'd also like to compare my 'best of' lists with other programmers on this list.
>> What are Richard and Lukas favorite functions?
>>
>> I can easily write a program to statically analyse my code to pull out
>> these figures.
>>
>> I think it would be a good idea if instead of throwing 10MBytes of
>> documentation at beginners we should aim to tell them what the
>> essential
>> set of modules is, and what the essential functions are in those modules
>> and how they should be used.
>>
>> As for advanced users - the documentation we have is all we have - and
>> gradual improvement is needed.
>>
>> Cheers
>>
>> /Joe
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> Lukas
>>>
>>> On Mon, Sep 26, 2016 at 10:03 PM, Joe Armstrong <erlang@REDACTED> wrote:
>>>> Regarding the documentation - there is a step0 that needs to be done
>>>> *before* attacking the document improvement problem.
>>>>
>>>> The following things are needed (I my opinion)
>>>>
>>>> 1) The number of DTDs in the documentation should be reduced to ONE
>>>>       there are currently 26 these are in
>>>>         otp_src_<vsn>/lib/erl_docgen/priv/dtd
>>>>
>>>> 2) tags that are used infrequently should be removed and the source XML
>>>>      corrected. Some tags are virtually *never* used.
>>>>
>>>> 3) All XML files should validate with the new DTD
>>>>
>>>> These steps need quite a lot of work ...
>>>>
>>>> 4) The Erlang parser should be changed to  exactly
>>>> reproduce the source.
>>>> Right now the parse tree of correct erlang has all the comments
>>>> and white space removed. I'd suggest attaching the comments to the
>>>> next following token (for example {atom,Line,theAtom} should become
>>>>      {atom, Line, theAtom, "the preceding comments and white space"}
>>>> It should be possible to *exactly* reconstruct the input from the parse
>>>> tree.
>>>>
>>>> <aside> - in the first erlang all the different ways of writing an integer
>>>> ended up as the same token. So writing 16#fc was the same as writing the
>>>> integer 252 and tokenized as {integer,Line,252} - the tokenizer threw
>>>> away the exact input so it was impossible to reconstruct the source
>>>> from the token stream. Now it's better the 16#fc is tokenized as
>>>> {integer,[{location,{Line,Col}},{text,"16$fc}], 252} - but comments
>>>> and white space are not
>>>> retained in the parse tree.
>>>>
>>>> Note: that change the parse tree is *not* a simple hack - all tools that
>>>> depend upon the parse tree have to be changed.
>>>> </aside>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 5) We should decide how to attach "floating" comments in the source.
>>>> Does a comment *before* a function apply to the next function or not?
>>>>
>>>> 6) We need some "injection" API to inject code, meta-data, examples
>>>> and documentations into a data base.
>>>>
>>>>     For example   inject:code("foo.erl") should inject a load of key/value
>>>> pairs into a data base, with something like the following keys
>>>>
>>>>       {text, Mod,Func,Arity} => the source code text
>>>>       {spec, Mod, Func, Arity} => the spec
>>>>       {doc, Mod,Func,Arity} => the documentation
>>>>       {examples,Mod,Func,Arity} = [Examples]
>>>>
>>>> The entities in the database should be sufficient to reconstruct the
>>>> original text, and perform various analysis of the functions.
>>>>
>>>> I think *most* of the problems involved are due to the difficulty of
>>>> extracting information from the source files and editing this when it is
>>>> wrong.
>>>>
>>>> I'm currently trying to do parts of this.
>>>>
>>>> A "relatively simple" program should be able to (for a given function)
>>>>
>>>>      - find the exact source text
>>>>      - find all old versions
>>>>      - find the specs and types referred to
>>>>     - find the documentation
>>>>     - find the test cases
>>>>
>>>> Doing so involves analysing the erlang sources the XML sources and the
>>>> test cases, and involves a deal of guess work.
>>>>
>>>> All this must be done on a moving target and should not break the
>>>> existing system.
>>>>
>>>> I suspect that the code to accurately manipulate the code and
>>>> documentation have been has been written several times in different
>>>> projects (for example
>>>> the wrangler, and the Eclipse interface) both need to manipulate the
>>>> source
>>>> in various ways.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Sep 26, 2016 at 6:41 PM,  <lloyd@REDACTED> wrote:
>>>>> Joe,
>>>>>
>>>>> You've said so well what I've been trying to harp on.
>>>>>
>>>>> My most recent timesink has been trying to understand xmerl sufficiently
>>>>> well to pull book data out of several different book APIs. Dave Thomas's
>>>>> 2007 tutorial has been a big help, but the black holes in my understanding
>>>>> still significantly impede my progress. So far I've spent maybe 10 to 15
>>>>> hours trying to scope it out. I can get much of what I need from Amazon's
>>>>> APIs, but I need a redundant source. The Library of Congress API completely
>>>>> eludes me; I get a little further with ISBNdb, but still not far enough.
>>>>>
>>>>> Given discussion on the documentation thread to date, it seems to me
>>>>> that there are four issues at stake:
>>>>>
>>>>> 1) Content deficiencies
>>>>> 2) Formatting issues
>>>>> 3) Lack of consensus of what we, as a community, want
>>>>> 4) How we move forward toward comprehensive improvement of
>>>>> documentation.
>>>>>
>>>>> Lukas Larsson's most recent post makes a good point.
>>>>>
>>>>> Bruce Yinhe tells me in a private post that his group is about to hire
>>>>> one person on a part-time basis to  work on documentation improvements.
>>>>>
>>>>> I've lost it in the thread, but as I recall we had some promising
>>>>> interest in documentation improvements from an Erricson employee.
>>>>>
>>>>> It would be great if we could begin to rally around these comments and
>>>>> find some kind of convergence toward progress.
>>>>>
>>>>> My take is to break the large task down into small chunks, bring the
>>>>> intelligence and resources of the community to bear on one specific issue at
>>>>> a time and, and get it done.
>>>> I think many (most) of the problems arise because what we are ultimately
>>>> doing is changing the content of a file at some place.
>>>>
>>>> Fixing a typo/bug  involves
>>>>
>>>>     1- finding the appropriate file
>>>>     2 - changing the file at the appropriate place
>>>>     3 - updating the file (somehow)
>>>>     4 - generating the downstream documents that depend upon the file
>>>>
>>>> All these steps are difficult
>>>>
>>>> We can imagine a simpler way:
>>>>
>>>> Suppose a file is a sequence of paragraphs. Each paragraph
>>>> has a GUId
>>>>
>>>> In (say) HTML
>>>>
>>>>    <p   guid = "b92a2705-3449-4fb9-8f11-fa55f7ead29f">
>>>>       This is my paragarph ...
>>>>    </p>
>>>>
>>>> If I want to update the paragraphs I just send a message
>>>>
>>>>      {update,"b92a2705-3449-4fb9-8f11-fa55f7ead29f"
>>>>         "the new content"}
>>>>
>>>> to some server - this should be checked (manually) and then
>>>> if approved used to update everything.
>>>>
>>>> In other threads I have argued that *everything* should be in a global
>>>> database with a huge DHT tracking where things are.
>>>>
>>>> A key to "changing things" is "naming things" and "finding things"
>>>>
>>>> yes another (even simpler) alternative is to have a message
>>>>
>>>>      {change,SHA,NewText}
>>>>
>>>> Meaning "change the paragraph with sha1 checksum <SHA> to <NewText>"
>>>>
>>>> Implementing this is easy - BUT all paragraphs with the same SHA
>>>> would be changed - which might not be what we want.
>>>>
>>>> I have (incidentally) experimented with this - tagging all paragraphs
>>>> with their SHAs and sticking the results in a database.
>>>>
>>>> <crazy idea>
>>>> make a server that accepts messages of the form
>>>>
>>>>     {change, <SHA>, <New Text>}
>>>>
>>>> The server finds a paragraph with sha1 checksum <SHA> and changes
>>>> it to <New Text> - it changes the appropriate file in a GIT archive
>>>> does all the /add/commit/push magic and the job is done.
>>>>
>>>> (I think I'll implement this for fun :-)
>>>>
>>>> </crazy idea>
>>>>
>>>>> I don't have the technical chops, but I'll gladly work with you or
>>>>> anyone else to address content issues on a module-by-module basis. I can ask
>>>>> Micky-the_Dunce questions and, perhaps, help clarify language. It would be
>>>>> great if you could help clarify intent and application issues.
>>>>>
>>>>> All the best,
>>>>>
>>>>> Lloyd
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: "Joe Armstrong" <erlang@REDACTED>
>>>>> Sent: Monday, September 26, 2016 6:42am
>>>>> To: "Lukas Larsson" <lukas@REDACTED>
>>>>> Cc: "Lloyd R. Prentice" <lloyd@REDACTED>,
>>>>> "erlang-questions@REDACTED" <erlang-questions@REDACTED>
>>>>> Subject: Re: [erlang-questions] Erlang documentation -- a modest
>>>>> proposal
>>>>>
>>>>> I think what I miss most are *examples*
>>>>>
>>>>> I've just been reading the edoc manual pages for a program
>>>>> called <XYZ> (name changed to avoid embarrassment)
>>>>>
>>>>> The functions are well documented - they types are well documented
>>>>> but I haven't a clue about which ORDER to call the functions
>>>>>
>>>>> Imagine a file system.
>>>>>
>>>>> We *document* the open, read, write, and close functions
>>>>> but we don't say you have to open the file before we read it.
>>>>> We dont say when we're done we have to close the file.
>>>>>
>>>>> We don't say this because it is *obvious*
>>>>>
>>>>> But for the module glonk, which exports, zizzle, taddle, glonk and plonk
>>>>> it is NOT obvious. Yes sure you all know you have to call glonk 3 times
>>>>> before calling plonk - but I don't know.
>>>>>
>>>>> Thats why we need examples.
>>>>>
>>>>> Often I search for a tutorial and find a ten line blog posting that
>>>>> actually shows me how to use a library - this gets me started.
>>>>>
>>>>> very short unit tests - placed inline are *very* useful
>>>>>
>>>>> for example:
>>>>>
>>>>>       "321" = lists:reverse("123")
>>>>>
>>>>> The unit test *are* the examples - what we don't have is software that
>>>>> parses the code, parses the documentation, parses the unit test
>>>>> and munges all together into a form that is convenient to read.
>>>>>
>>>>> I'm actually trying to write something like this now - hence my wails
>>>>> of anguish over css.
>>>>>
>>>>> Wish me luck
>>>>>
>>>>> /Joe
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 26, 2016 at 11:58 AM, Lukas Larsson <lukas@REDACTED>
>>>>> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> On Thu, Sep 22, 2016 at 11:56 PM, Lloyd R. Prentice
>>>>>> <lloyd@REDACTED>
>>>>>> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> To date, this thread has generated quite a few worthwhile insights and
>>>>>>> ideas. My fear is that they will be deep-sixed into the archive. On
>>>>>>> the
>>>>>>> other hand, major revision is a daunting task and unlikely to happen.
>>>>>>>
>>>>>>> But maybe we can focus on specific issues and make iterative headway.
>>>>>>>
>>>>>>> Fewer than half of the functions in the lists library, for instance,
>>>>>>> have
>>>>>>> code examples. Suppose over the span of one week we were collectively
>>>>>>> focus
>>>>>>> on generating at least two code examples for each function in one
>>>>>>> library.
>>>>>>>
>>>>>>> At the end of the week we could organize the submissions and vote on
>>>>>>> best
>>>>>>> candidates for inclusion in the docs. That done, we can pick another
>>>>>>> module.
>>>>>>>
>>>>>>> Thus, with not much effort from any one individual, a small posse of
>>>>>>> volunteer Erlang wizards could make short work of deficiencies in the
>>>>>>> docs.
>>>>>>>
>>>>>>> Anyway, it's an idea.
>>>>>>>
>>>>>>> All the best,
>>>>>>>
>>>>>>> LRP
>>>>>>>
>>>>>> I think that it is great to see everyone talking about wanting to
>>>>>> improve
>>>>>> the documentation. The contributions to the Erlang/OTP project that I
>>>>>> value
>>>>>> that most are documentation changes that make the intention clearer, or
>>>>>> explains some corner case somewhere which the docs did not initially
>>>>>> mention.
>>>>>>
>>>>>> Unfortunately, once one has figured out how a function works there
>>>>>> seems to
>>>>>> be very little incentive to make the docs clearer. I would estimate
>>>>>> that
>>>>>> about every 20th pull request we get is a documentation fix, and more
>>>>>> than
>>>>>> half of those are fixes of speling misstakes (which are great!).
>>>>>>
>>>>>> I've just come back from about two weeks of vacation and this
>>>>>> discussion has
>>>>>> resulted in roughly 0 pull requests for changes in the documentation.
>>>>>> Would
>>>>>> it be possible to steer this discussion into doing something instead of
>>>>>> talking about doing something? Yes the technology/layout is not
>>>>>> perfect, but
>>>>>> as Loïc said, it is the content that matters the most.
>>>>>>
>>>>>> Lukas
>>>>>> // my own oppinions
>>>>>>
>>>>>> _______________________________________________
>>>>>> erlang-questions mailing list
>>>>>> erlang-questions@REDACTED
>>>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>>>
>>>>>
>>>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions