[erlang-questions] Erlang documentation -- a modest proposal

Joe Armstrong erlang@REDACTED
Tue Sep 27 08:26:52 CEST 2016


On Mon, Sep 26, 2016 at 10:51 PM, Lukas Larsson <lukas@REDACTED> wrote:
> Hello Joe,
>
> What you are talking about is about lowering the cost of entry when doing
> documentation updates. This is a great effort and one which we should really
> pursue, but as you say it is a lot of work. Is there really a *need* for
> that to be in place first? Can't there be lots of smaller parallel efforts
> (we as Erlang programmers should be familiar with the concept) that improve
> the content at the same time?
>
> Since I'm preaching action rather than discussions, it would hypocritical of
> me to not do anything so I've submitted a PR with a documentation change
> that I believe will make binary_to_term/1 a little clearer for anyone
> wondering how to use it for the first time:
> https://github.com/erlang/otp/pull/1181

Thank you Lukas.

This patch made my head explode.

How can an example of binary_to_term NOT mention term_to_binary?

When I lecture I say that that the PAIR of functions term_to_binary
and binary_to_term are FANTASTIC - and I hang my head in shame
if having read one of my books this fact was not made abundantly clear.

NOT ALL FUNCTIONS ARE EQUALLY IMPORTANT

The Erlang manual page is *gigantic* I for one, certainly don't know all
the functions in the Erlang manual page - I also only use a handful of
functions in file, dict and lists

It occurs to me that I probably only use 5% of the functions in file.erl
8% of the functions in dict.erl and so on. I don't know the numbers here,
so I'm guessing.

Since I can manage perfectly well
knowing only 5% of the functions in file, then I'd like to ask

   - do we need all the other functions?

     Answer: Yes - on rare occasions I need a specific functionality, so I have
     search the manual page and read the text very carefully

   - do beginners know which functions in (say file) are what I consider
     to be the important ones?

     Answer: No - how could they

So here's an idea .. we (or I in this case) write a 'best of' document
The best of should have:

     - the *best* modules (which modules do I use most)
     - the *best functions* (which functions (per module) do I use most)

I think I "know" that answers without analysis.

So, for example, in dict.erl I'd hazard I guess that I use new/1,
store/3, find/2, from_list/1, to_list/2.

For me these functions are in my 'working set' of functions. ie the
functions that I use a lot and are in my memory.

It seems to me that it might be beneficial to make some special
'best of' documentation that only documents the essential core functions in
the libraries.

I'd also like to compare my 'best of' lists with other programmers on this list.
What are Richard and Lukas favorite functions?

I can easily write a program to statically analyse my code to pull out
these figures.

I think it would be a good idea if instead of throwing 10MBytes of
documentation at beginners we should aim to tell them what the
essential
set of modules is, and what the essential functions are in those modules
and how they should be used.

As for advanced users - the documentation we have is all we have - and
gradual improvement is needed.

Cheers

/Joe










>
> Lukas
>
> On Mon, Sep 26, 2016 at 10:03 PM, Joe Armstrong <erlang@REDACTED> wrote:
>>
>> Regarding the documentation - there is a step0 that needs to be done
>> *before* attacking the document improvement problem.
>>
>> The following things are needed (I my opinion)
>>
>> 1) The number of DTDs in the documentation should be reduced to ONE
>>      there are currently 26 these are in
>>        otp_src_<vsn>/lib/erl_docgen/priv/dtd
>>
>> 2) tags that are used infrequently should be removed and the source XML
>>     corrected. Some tags are virtually *never* used.
>>
>> 3) All XML files should validate with the new DTD
>>
>> These steps need quite a lot of work ...
>>
>> 4) The Erlang parser should be changed to  exactly
>> reproduce the source.
>> Right now the parse tree of correct erlang has all the comments
>> and white space removed. I'd suggest attaching the comments to the
>> next following token (for example {atom,Line,theAtom} should become
>>     {atom, Line, theAtom, "the preceding comments and white space"}
>> It should be possible to *exactly* reconstruct the input from the parse
>> tree.
>>
>> <aside> - in the first erlang all the different ways of writing an integer
>> ended up as the same token. So writing 16#fc was the same as writing the
>> integer 252 and tokenized as {integer,Line,252} - the tokenizer threw
>> away the exact input so it was impossible to reconstruct the source
>> from the token stream. Now it's better the 16#fc is tokenized as
>> {integer,[{location,{Line,Col}},{text,"16$fc}], 252} - but comments
>> and white space are not
>> retained in the parse tree.
>>
>> Note: that change the parse tree is *not* a simple hack - all tools that
>> depend upon the parse tree have to be changed.
>> </aside>
>>
>>
>>
>>
>>
>> 5) We should decide how to attach "floating" comments in the source.
>> Does a comment *before* a function apply to the next function or not?
>>
>> 6) We need some "injection" API to inject code, meta-data, examples
>> and documentations into a data base.
>>
>>    For example   inject:code("foo.erl") should inject a load of key/value
>> pairs into a data base, with something like the following keys
>>
>>      {text, Mod,Func,Arity} => the source code text
>>      {spec, Mod, Func, Arity} => the spec
>>      {doc, Mod,Func,Arity} => the documentation
>>      {examples,Mod,Func,Arity} = [Examples]
>>
>> The entities in the database should be sufficient to reconstruct the
>> original text, and perform various analysis of the functions.
>>
>> I think *most* of the problems involved are due to the difficulty of
>> extracting information from the source files and editing this when it is
>> wrong.
>>
>> I'm currently trying to do parts of this.
>>
>> A "relatively simple" program should be able to (for a given function)
>>
>>     - find the exact source text
>>     - find all old versions
>>     - find the specs and types referred to
>>    - find the documentation
>>    - find the test cases
>>
>> Doing so involves analysing the erlang sources the XML sources and the
>> test cases, and involves a deal of guess work.
>>
>> All this must be done on a moving target and should not break the
>> existing system.
>>
>> I suspect that the code to accurately manipulate the code and
>> documentation have been has been written several times in different
>> projects (for example
>> the wrangler, and the Eclipse interface) both need to manipulate the
>> source
>> in various ways.
>>
>>
>>
>>
>> On Mon, Sep 26, 2016 at 6:41 PM,  <lloyd@REDACTED> wrote:
>> > Joe,
>> >
>> > You've said so well what I've been trying to harp on.
>> >
>> > My most recent timesink has been trying to understand xmerl sufficiently
>> > well to pull book data out of several different book APIs. Dave Thomas's
>> > 2007 tutorial has been a big help, but the black holes in my understanding
>> > still significantly impede my progress. So far I've spent maybe 10 to 15
>> > hours trying to scope it out. I can get much of what I need from Amazon's
>> > APIs, but I need a redundant source. The Library of Congress API completely
>> > eludes me; I get a little further with ISBNdb, but still not far enough.
>> >
>> > Given discussion on the documentation thread to date, it seems to me
>> > that there are four issues at stake:
>> >
>> > 1) Content deficiencies
>> > 2) Formatting issues
>> > 3) Lack of consensus of what we, as a community, want
>> > 4) How we move forward toward comprehensive improvement of
>> > documentation.
>> >
>> > Lukas Larsson's most recent post makes a good point.
>> >
>> > Bruce Yinhe tells me in a private post that his group is about to hire
>> > one person on a part-time basis to  work on documentation improvements.
>> >
>> > I've lost it in the thread, but as I recall we had some promising
>> > interest in documentation improvements from an Erricson employee.
>> >
>> > It would be great if we could begin to rally around these comments and
>> > find some kind of convergence toward progress.
>> >
>> > My take is to break the large task down into small chunks, bring the
>> > intelligence and resources of the community to bear on one specific issue at
>> > a time and, and get it done.
>>
>> I think many (most) of the problems arise because what we are ultimately
>> doing is changing the content of a file at some place.
>>
>> Fixing a typo/bug  involves
>>
>>    1- finding the appropriate file
>>    2 - changing the file at the appropriate place
>>    3 - updating the file (somehow)
>>    4 - generating the downstream documents that depend upon the file
>>
>> All these steps are difficult
>>
>> We can imagine a simpler way:
>>
>> Suppose a file is a sequence of paragraphs. Each paragraph
>> has a GUId
>>
>> In (say) HTML
>>
>>   <p   guid = "b92a2705-3449-4fb9-8f11-fa55f7ead29f">
>>      This is my paragarph ...
>>   </p>
>>
>> If I want to update the paragraphs I just send a message
>>
>>     {update,"b92a2705-3449-4fb9-8f11-fa55f7ead29f"
>>        "the new content"}
>>
>> to some server - this should be checked (manually) and then
>> if approved used to update everything.
>>
>> In other threads I have argued that *everything* should be in a global
>> database with a huge DHT tracking where things are.
>>
>> A key to "changing things" is "naming things" and "finding things"
>>
>> yes another (even simpler) alternative is to have a message
>>
>>     {change,SHA,NewText}
>>
>> Meaning "change the paragraph with sha1 checksum <SHA> to <NewText>"
>>
>> Implementing this is easy - BUT all paragraphs with the same SHA
>> would be changed - which might not be what we want.
>>
>> I have (incidentally) experimented with this - tagging all paragraphs
>> with their SHAs and sticking the results in a database.
>>
>> <crazy idea>
>> make a server that accepts messages of the form
>>
>>    {change, <SHA>, <New Text>}
>>
>> The server finds a paragraph with sha1 checksum <SHA> and changes
>> it to <New Text> - it changes the appropriate file in a GIT archive
>> does all the /add/commit/push magic and the job is done.
>>
>> (I think I'll implement this for fun :-)
>>
>> </crazy idea>
>>
>> >
>> > I don't have the technical chops, but I'll gladly work with you or
>> > anyone else to address content issues on a module-by-module basis. I can ask
>> > Micky-the_Dunce questions and, perhaps, help clarify language. It would be
>> > great if you could help clarify intent and application issues.
>> >
>> > All the best,
>> >
>> > Lloyd
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: "Joe Armstrong" <erlang@REDACTED>
>> > Sent: Monday, September 26, 2016 6:42am
>> > To: "Lukas Larsson" <lukas@REDACTED>
>> > Cc: "Lloyd R. Prentice" <lloyd@REDACTED>,
>> > "erlang-questions@REDACTED" <erlang-questions@REDACTED>
>> > Subject: Re: [erlang-questions] Erlang documentation -- a modest
>> > proposal
>> >
>> > I think what I miss most are *examples*
>> >
>> > I've just been reading the edoc manual pages for a program
>> > called <XYZ> (name changed to avoid embarrassment)
>> >
>> > The functions are well documented - they types are well documented
>> > but I haven't a clue about which ORDER to call the functions
>> >
>> > Imagine a file system.
>> >
>> > We *document* the open, read, write, and close functions
>> > but we don't say you have to open the file before we read it.
>> > We dont say when we're done we have to close the file.
>> >
>> > We don't say this because it is *obvious*
>> >
>> > But for the module glonk, which exports, zizzle, taddle, glonk and plonk
>> > it is NOT obvious. Yes sure you all know you have to call glonk 3 times
>> > before calling plonk - but I don't know.
>> >
>> > Thats why we need examples.
>> >
>> > Often I search for a tutorial and find a ten line blog posting that
>> > actually shows me how to use a library - this gets me started.
>> >
>> > very short unit tests - placed inline are *very* useful
>> >
>> > for example:
>> >
>> >      "321" = lists:reverse("123")
>> >
>> > The unit test *are* the examples - what we don't have is software that
>> > parses the code, parses the documentation, parses the unit test
>> > and munges all together into a form that is convenient to read.
>> >
>> > I'm actually trying to write something like this now - hence my wails
>> > of anguish over css.
>> >
>> > Wish me luck
>> >
>> > /Joe
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Sep 26, 2016 at 11:58 AM, Lukas Larsson <lukas@REDACTED>
>> > wrote:
>> >> Hello,
>> >>
>> >> On Thu, Sep 22, 2016 at 11:56 PM, Lloyd R. Prentice
>> >> <lloyd@REDACTED>
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> To date, this thread has generated quite a few worthwhile insights and
>> >>> ideas. My fear is that they will be deep-sixed into the archive. On
>> >>> the
>> >>> other hand, major revision is a daunting task and unlikely to happen.
>> >>>
>> >>> But maybe we can focus on specific issues and make iterative headway.
>> >>>
>> >>> Fewer than half of the functions in the lists library, for instance,
>> >>> have
>> >>> code examples. Suppose over the span of one week we were collectively
>> >>> focus
>> >>> on generating at least two code examples for each function in one
>> >>> library.
>> >>>
>> >>> At the end of the week we could organize the submissions and vote on
>> >>> best
>> >>> candidates for inclusion in the docs. That done, we can pick another
>> >>> module.
>> >>>
>> >>> Thus, with not much effort from any one individual, a small posse of
>> >>> volunteer Erlang wizards could make short work of deficiencies in the
>> >>> docs.
>> >>>
>> >>> Anyway, it's an idea.
>> >>>
>> >>> All the best,
>> >>>
>> >>> LRP
>> >>>
>> >>
>> >> I think that it is great to see everyone talking about wanting to
>> >> improve
>> >> the documentation. The contributions to the Erlang/OTP project that I
>> >> value
>> >> that most are documentation changes that make the intention clearer, or
>> >> explains some corner case somewhere which the docs did not initially
>> >> mention.
>> >>
>> >> Unfortunately, once one has figured out how a function works there
>> >> seems to
>> >> be very little incentive to make the docs clearer. I would estimate
>> >> that
>> >> about every 20th pull request we get is a documentation fix, and more
>> >> than
>> >> half of those are fixes of speling misstakes (which are great!).
>> >>
>> >> I've just come back from about two weeks of vacation and this
>> >> discussion has
>> >> resulted in roughly 0 pull requests for changes in the documentation.
>> >> Would
>> >> it be possible to steer this discussion into doing something instead of
>> >> talking about doing something? Yes the technology/layout is not
>> >> perfect, but
>> >> as Loïc said, it is the content that matters the most.
>> >>
>> >> Lukas
>> >> // my own oppinions
>> >>
>> >> _______________________________________________
>> >> erlang-questions mailing list
>> >> erlang-questions@REDACTED
>> >> http://erlang.org/mailman/listinfo/erlang-questions
>> >>
>> >
>> >
>
>



More information about the erlang-questions mailing list