[erlang-questions] Erlang documentation -- a modest proposal

Mon Sep 26 22:03:15 CEST 2016

Regarding the documentation - there is a step0 that needs to be done
*before* attacking the document improvement problem.

The following things are needed (I my opinion)

1) The number of DTDs in the documentation should be reduced to ONE
     there are currently 26 these are in
       otp_src_<vsn>/lib/erl_docgen/priv/dtd

2) tags that are used infrequently should be removed and the source XML
    corrected. Some tags are virtually *never* used.

3) All XML files should validate with the new DTD

These steps need quite a lot of work ...

4) The Erlang parser should be changed to  exactly
reproduce the source.
Right now the parse tree of correct erlang has all the comments
and white space removed. I'd suggest attaching the comments to the
next following token (for example {atom,Line,theAtom} should become
    {atom, Line, theAtom, "the preceding comments and white space"}
It should be possible to *exactly* reconstruct the input from the parse
tree.

<aside> - in the first erlang all the different ways of writing an integer
ended up as the same token. So writing 16#fc was the same as writing the
integer 252 and tokenized as {integer,Line,252} - the tokenizer threw
away the exact input so it was impossible to reconstruct the source
from the token stream. Now it's better the 16#fc is tokenized as
{integer,[{location,{Line,Col}},{text,"16$fc}], 252} - but comments
and white space are not
retained in the parse tree.

Note: that change the parse tree is *not* a simple hack - all tools that
depend upon the parse tree have to be changed.
</aside>

5) We should decide how to attach "floating" comments in the source.
Does a comment *before* a function apply to the next function or not?

6) We need some "injection" API to inject code, meta-data, examples
and documentations into a data base.

   For example   inject:code("foo.erl") should inject a load of key/value
pairs into a data base, with something like the following keys

     {text, Mod,Func,Arity} => the source code text
     {spec, Mod, Func, Arity} => the spec
     {doc, Mod,Func,Arity} => the documentation
     {examples,Mod,Func,Arity} = [Examples]

The entities in the database should be sufficient to reconstruct the
original text, and perform various analysis of the functions.

I think *most* of the problems involved are due to the difficulty of
extracting information from the source files and editing this when it is wrong.

I'm currently trying to do parts of this.

A "relatively simple" program should be able to (for a given function)

    - find the exact source text
    - find all old versions
    - find the specs and types referred to
   - find the documentation
   - find the test cases

Doing so involves analysing the erlang sources the XML sources and the
test cases, and involves a deal of guess work.

All this must be done on a moving target and should not break the
existing system.

I suspect that the code to accurately manipulate the code and
documentation have been has been written several times in different
projects (for example
the wrangler, and the Eclipse interface) both need to manipulate the source
in various ways.

On Mon, Sep 26, 2016 at 6:41 PM,  <lloyd@REDACTED> wrote:
> Joe,
>
> You've said so well what I've been trying to harp on.
>
> My most recent timesink has been trying to understand xmerl sufficiently well to pull book data out of several different book APIs. Dave Thomas's 2007 tutorial has been a big help, but the black holes in my understanding still significantly impede my progress. So far I've spent maybe 10 to 15 hours trying to scope it out. I can get much of what I need from Amazon's APIs, but I need a redundant source. The Library of Congress API completely eludes me; I get a little further with ISBNdb, but still not far enough.
>
> Given discussion on the documentation thread to date, it seems to me that there are four issues at stake:
>
> 1) Content deficiencies
> 2) Formatting issues
> 3) Lack of consensus of what we, as a community, want
> 4) How we move forward toward comprehensive improvement of documentation.
>
> Lukas Larsson's most recent post makes a good point.
>
> Bruce Yinhe tells me in a private post that his group is about to hire one person on a part-time basis to  work on documentation improvements.
>
> I've lost it in the thread, but as I recall we had some promising interest in documentation improvements from an Erricson employee.
>
> It would be great if we could begin to rally around these comments and find some kind of convergence toward progress.
>
> My take is to break the large task down into small chunks, bring the intelligence and resources of the community to bear on one specific issue at a time and, and get it done.

I think many (most) of the problems arise because what we are ultimately
doing is changing the content of a file at some place.

Fixing a typo/bug  involves

   1- finding the appropriate file
   2 - changing the file at the appropriate place
   3 - updating the file (somehow)
   4 - generating the downstream documents that depend upon the file

All these steps are difficult

We can imagine a simpler way:

Suppose a file is a sequence of paragraphs. Each paragraph
has a GUId

In (say) HTML

  <p   guid = "b92a2705-3449-4fb9-8f11-fa55f7ead29f">
     This is my paragarph ...
  </p>

If I want to update the paragraphs I just send a message

    {update,"b92a2705-3449-4fb9-8f11-fa55f7ead29f"
       "the new content"}

to some server - this should be checked (manually) and then
if approved used to update everything.

In other threads I have argued that *everything* should be in a global
database with a huge DHT tracking where things are.

A key to "changing things" is "naming things" and "finding things"

yes another (even simpler) alternative is to have a message

    {change,SHA,NewText}

Meaning "change the paragraph with sha1 checksum <SHA> to <NewText>"

Implementing this is easy - BUT all paragraphs with the same SHA
would be changed - which might not be what we want.

I have (incidentally) experimented with this - tagging all paragraphs
with their SHAs and sticking the results in a database.

<crazy idea>
make a server that accepts messages of the form

   {change, <SHA>, <New Text>}

The server finds a paragraph with sha1 checksum <SHA> and changes
it to <New Text> - it changes the appropriate file in a GIT archive
does all the /add/commit/push magic and the job is done.

(I think I'll implement this for fun :-)

</crazy idea>

>
> I don't have the technical chops, but I'll gladly work with you or anyone else to address content issues on a module-by-module basis. I can ask Micky-the_Dunce questions and, perhaps, help clarify language. It would be great if you could help clarify intent and application issues.
>
> All the best,
>
> Lloyd
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: "Joe Armstrong" <erlang@REDACTED>
> Sent: Monday, September 26, 2016 6:42am
> To: "Lukas Larsson" <lukas@REDACTED>
> Cc: "Lloyd R. Prentice" <lloyd@REDACTED>, "erlang-questions@REDACTED" <erlang-questions@REDACTED>
> Subject: Re: [erlang-questions] Erlang documentation -- a modest proposal
>
> I think what I miss most are *examples*
>
> I've just been reading the edoc manual pages for a program
> called <XYZ> (name changed to avoid embarrassment)
>
> The functions are well documented - they types are well documented
> but I haven't a clue about which ORDER to call the functions
>
> Imagine a file system.
>
> We *document* the open, read, write, and close functions
> but we don't say you have to open the file before we read it.
> We dont say when we're done we have to close the file.
>
> We don't say this because it is *obvious*
>
> But for the module glonk, which exports, zizzle, taddle, glonk and plonk
> it is NOT obvious. Yes sure you all know you have to call glonk 3 times
> before calling plonk - but I don't know.
>
> Thats why we need examples.
>
> Often I search for a tutorial and find a ten line blog posting that
> actually shows me how to use a library - this gets me started.
>
> very short unit tests - placed inline are *very* useful
>
> for example:
>
>      "321" = lists:reverse("123")
>
> The unit test *are* the examples - what we don't have is software that
> parses the code, parses the documentation, parses the unit test
> and munges all together into a form that is convenient to read.
>
> I'm actually trying to write something like this now - hence my wails
> of anguish over css.
>
> Wish me luck
>
> /Joe
>
>
>
>
>
>
>
> On Mon, Sep 26, 2016 at 11:58 AM, Lukas Larsson <lukas@REDACTED> wrote:
>> Hello,
>>
>> On Thu, Sep 22, 2016 at 11:56 PM, Lloyd R. Prentice <lloyd@REDACTED>
>> wrote:
>>>
>>> Hello,
>>>
>>> To date, this thread has generated quite a few worthwhile insights and
>>> ideas. My fear is that they will be deep-sixed into the archive. On the
>>> other hand, major revision is a daunting task and unlikely to happen.
>>>
>>> But maybe we can focus on specific issues and make iterative headway.
>>>
>>> Fewer than half of the functions in the lists library, for instance, have
>>> code examples. Suppose over the span of one week we were collectively focus
>>> on generating at least two code examples for each function in one library.
>>>
>>> At the end of the week we could organize the submissions and vote on best
>>> candidates for inclusion in the docs. That done, we can pick another module.
>>>
>>> Thus, with not much effort from any one individual, a small posse of
>>> volunteer Erlang wizards could make short work of deficiencies in the docs.
>>>
>>> Anyway, it's an idea.
>>>
>>> All the best,
>>>
>>> LRP
>>>
>>
>> I think that it is great to see everyone talking about wanting to improve
>> the documentation. The contributions to the Erlang/OTP project that I value
>> that most are documentation changes that make the intention clearer, or
>> explains some corner case somewhere which the docs did not initially
>> mention.
>>
>> Unfortunately, once one has figured out how a function works there seems to
>> be very little incentive to make the docs clearer. I would estimate that
>> about every 20th pull request we get is a documentation fix, and more than
>> half of those are fixes of speling misstakes (which are great!).
>>
>> I've just come back from about two weeks of vacation and this discussion has
>> resulted in roughly 0 pull requests for changes in the documentation. Would
>> it be possible to steer this discussion into doing something instead of
>> talking about doing something? Yes the technology/layout is not perfect, but
>> as Loïc said, it is the content that matters the most.
>>
>> Lukas
>> // my own oppinions
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
>