[erlang-questions] Erlang documentation -- a modest proposal

Tue Sep 27 13:49:02 CEST 2016

On Tue, Sep 27, 2016 at 1:11 PM, Luke <random.outcomes@REDACTED> wrote:

> Joe, check out this startup - https://www.simpla.io/
>
> They allow editable web pages, which creates blobs which get saves
> directly into a database.  No more digging through source files to edit
> docs, people (with appropriate permission) can edit the page live. It would
> be very straightforward to write a program to dump from the database to
> XML/JSON or whatever format you're interested in, making it portable, and
> fairly trivial to go the other way. (Full disclosure: It's not my startup
> but they're my mates, we went through the same accelerator).
>
> Is the raw XML documentation available anywhere?
>

The documentation source is in the otp git repository. Each application has
a doc/src directory with Makefile and .xml files.

The system documentation is under Root/system/doc

There are a few tweaks where the source is not XML but instead a Markdown
file.

/Kenneth

>
> Thanks,
> Luke
>
> On Tue, Sep 27, 2016 at 6:03 AM, Joe Armstrong <erlang@REDACTED> wrote:
>
>> Regarding the documentation - there is a step0 that needs to be done
>> *before* attacking the document improvement problem.
>>
>> The following things are needed (I my opinion)
>>
>> 1) The number of DTDs in the documentation should be reduced to ONE
>>      there are currently 26 these are in
>>        otp_src_<vsn>/lib/erl_docgen/priv/dtd
>>
>> 2) tags that are used infrequently should be removed and the source XML
>>     corrected. Some tags are virtually *never* used.
>>
>> 3) All XML files should validate with the new DTD
>>
>> These steps need quite a lot of work ...
>>
>> 4) The Erlang parser should be changed to  exactly
>> reproduce the source.
>> Right now the parse tree of correct erlang has all the comments
>> and white space removed. I'd suggest attaching the comments to the
>> next following token (for example {atom,Line,theAtom} should become
>>     {atom, Line, theAtom, "the preceding comments and white space"}
>> It should be possible to *exactly* reconstruct the input from the parse
>> tree.
>>
>> <aside> - in the first erlang all the different ways of writing an integer
>> ended up as the same token. So writing 16#fc was the same as writing the
>> integer 252 and tokenized as {integer,Line,252} - the tokenizer threw
>> away the exact input so it was impossible to reconstruct the source
>> from the token stream. Now it's better the 16#fc is tokenized as
>> {integer,[{location,{Line,Col}},{text,"16$fc}], 252} - but comments
>> and white space are not
>> retained in the parse tree.
>>
>> Note: that change the parse tree is *not* a simple hack - all tools that
>> depend upon the parse tree have to be changed.
>> </aside>
>>
>>
>>
>>
>>
>> 5) We should decide how to attach "floating" comments in the source.
>> Does a comment *before* a function apply to the next function or not?
>>
>> 6) We need some "injection" API to inject code, meta-data, examples
>> and documentations into a data base.
>>
>>    For example   inject:code("foo.erl") should inject a load of key/value
>> pairs into a data base, with something like the following keys
>>
>>      {text, Mod,Func,Arity} => the source code text
>>      {spec, Mod, Func, Arity} => the spec
>>      {doc, Mod,Func,Arity} => the documentation
>>      {examples,Mod,Func,Arity} = [Examples]
>>
>> The entities in the database should be sufficient to reconstruct the
>> original text, and perform various analysis of the functions.
>>
>> I think *most* of the problems involved are due to the difficulty of
>> extracting information from the source files and editing this when it is
>> wrong.
>>
>> I'm currently trying to do parts of this.
>>
>> A "relatively simple" program should be able to (for a given function)
>>
>>     - find the exact source text
>>     - find all old versions
>>     - find the specs and types referred to
>>    - find the documentation
>>    - find the test cases
>>
>> Doing so involves analysing the erlang sources the XML sources and the
>> test cases, and involves a deal of guess work.
>>
>> All this must be done on a moving target and should not break the
>> existing system.
>>
>> I suspect that the code to accurately manipulate the code and
>> documentation have been has been written several times in different
>> projects (for example
>> the wrangler, and the Eclipse interface) both need to manipulate the
>> source
>> in various ways.
>>
>>
>>
>>
>> On Mon, Sep 26, 2016 at 6:41 PM,  <lloyd@REDACTED> wrote:
>> > Joe,
>> >
>> > You've said so well what I've been trying to harp on.
>> >
>> > My most recent timesink has been trying to understand xmerl
>> sufficiently well to pull book data out of several different book APIs.
>> Dave Thomas's 2007 tutorial has been a big help, but the black holes in my
>> understanding still significantly impede my progress. So far I've spent
>> maybe 10 to 15 hours trying to scope it out. I can get much of what I need
>> from Amazon's APIs, but I need a redundant source. The Library of Congress
>> API completely eludes me; I get a little further with ISBNdb, but still not
>> far enough.
>> >
>> > Given discussion on the documentation thread to date, it seems to me
>> that there are four issues at stake:
>> >
>> > 1) Content deficiencies
>> > 2) Formatting issues
>> > 3) Lack of consensus of what we, as a community, want
>> > 4) How we move forward toward comprehensive improvement of
>> documentation.
>> >
>> > Lukas Larsson's most recent post makes a good point.
>> >
>> > Bruce Yinhe tells me in a private post that his group is about to hire
>> one person on a part-time basis to  work on documentation improvements.
>> >
>> > I've lost it in the thread, but as I recall we had some promising
>> interest in documentation improvements from an Erricson employee.
>> >
>> > It would be great if we could begin to rally around these comments and
>> find some kind of convergence toward progress.
>> >
>> > My take is to break the large task down into small chunks, bring the
>> intelligence and resources of the community to bear on one specific issue
>> at a time and, and get it done.
>>
>> I think many (most) of the problems arise because what we are ultimately
>> doing is changing the content of a file at some place.
>>
>> Fixing a typo/bug  involves
>>
>>    1- finding the appropriate file
>>    2 - changing the file at the appropriate place
>>    3 - updating the file (somehow)
>>    4 - generating the downstream documents that depend upon the file
>>
>> All these steps are difficult
>>
>> We can imagine a simpler way:
>>
>> Suppose a file is a sequence of paragraphs. Each paragraph
>> has a GUId
>>
>> In (say) HTML
>>
>>   <p   guid = "b92a2705-3449-4fb9-8f11-fa55f7ead29f">
>>      This is my paragarph ...
>>   </p>
>>
>> If I want to update the paragraphs I just send a message
>>
>>     {update,"b92a2705-3449-4fb9-8f11-fa55f7ead29f"
>>        "the new content"}
>>
>> to some server - this should be checked (manually) and then
>> if approved used to update everything.
>>
>> In other threads I have argued that *everything* should be in a global
>> database with a huge DHT tracking where things are.
>>
>> A key to "changing things" is "naming things" and "finding things"
>>
>> yes another (even simpler) alternative is to have a message
>>
>>     {change,SHA,NewText}
>>
>> Meaning "change the paragraph with sha1 checksum <SHA> to <NewText>"
>>
>> Implementing this is easy - BUT all paragraphs with the same SHA
>> would be changed - which might not be what we want.
>>
>> I have (incidentally) experimented with this - tagging all paragraphs
>> with their SHAs and sticking the results in a database.
>>
>> <crazy idea>
>> make a server that accepts messages of the form
>>
>>    {change, <SHA>, <New Text>}
>>
>> The server finds a paragraph with sha1 checksum <SHA> and changes
>> it to <New Text> - it changes the appropriate file in a GIT archive
>> does all the /add/commit/push magic and the job is done.
>>
>> (I think I'll implement this for fun :-)
>>
>> </crazy idea>
>>
>> >
>> > I don't have the technical chops, but I'll gladly work with you or
>> anyone else to address content issues on a module-by-module basis. I can
>> ask Micky-the_Dunce questions and, perhaps, help clarify language. It would
>> be great if you could help clarify intent and application issues.
>> >
>> > All the best,
>> >
>> > Lloyd
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: "Joe Armstrong" <erlang@REDACTED>
>> > Sent: Monday, September 26, 2016 6:42am
>> > To: "Lukas Larsson" <lukas@REDACTED>
>> > Cc: "Lloyd R. Prentice" <lloyd@REDACTED>, "
>> erlang-questions@REDACTED" <erlang-questions@REDACTED>
>> > Subject: Re: [erlang-questions] Erlang documentation -- a modest
>> proposal
>> >
>> > I think what I miss most are *examples*
>> >
>> > I've just been reading the edoc manual pages for a program
>> > called <XYZ> (name changed to avoid embarrassment)
>> >
>> > The functions are well documented - they types are well documented
>> > but I haven't a clue about which ORDER to call the functions
>> >
>> > Imagine a file system.
>> >
>> > We *document* the open, read, write, and close functions
>> > but we don't say you have to open the file before we read it.
>> > We dont say when we're done we have to close the file.
>> >
>> > We don't say this because it is *obvious*
>> >
>> > But for the module glonk, which exports, zizzle, taddle, glonk and plonk
>> > it is NOT obvious. Yes sure you all know you have to call glonk 3 times
>> > before calling plonk - but I don't know.
>> >
>> > Thats why we need examples.
>> >
>> > Often I search for a tutorial and find a ten line blog posting that
>> > actually shows me how to use a library - this gets me started.
>> >
>> > very short unit tests - placed inline are *very* useful
>> >
>> > for example:
>> >
>> >      "321" = lists:reverse("123")
>> >
>> > The unit test *are* the examples - what we don't have is software that
>> > parses the code, parses the documentation, parses the unit test
>> > and munges all together into a form that is convenient to read.
>> >
>> > I'm actually trying to write something like this now - hence my wails
>> > of anguish over css.
>> >
>> > Wish me luck
>> >
>> > /Joe
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Sep 26, 2016 at 11:58 AM, Lukas Larsson <lukas@REDACTED>
>> wrote:
>> >> Hello,
>> >>
>> >> On Thu, Sep 22, 2016 at 11:56 PM, Lloyd R. Prentice <
>> lloyd@REDACTED>
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> To date, this thread has generated quite a few worthwhile insights and
>> >>> ideas. My fear is that they will be deep-sixed into the archive. On
>> the
>> >>> other hand, major revision is a daunting task and unlikely to happen.
>> >>>
>> >>> But maybe we can focus on specific issues and make iterative headway.
>> >>>
>> >>> Fewer than half of the functions in the lists library, for instance,
>> have
>> >>> code examples. Suppose over the span of one week we were collectively
>> focus
>> >>> on generating at least two code examples for each function in one
>> library.
>> >>>
>> >>> At the end of the week we could organize the submissions and vote on
>> best
>> >>> candidates for inclusion in the docs. That done, we can pick another
>> module.
>> >>>
>> >>> Thus, with not much effort from any one individual, a small posse of
>> >>> volunteer Erlang wizards could make short work of deficiencies in the
>> docs.
>> >>>
>> >>> Anyway, it's an idea.
>> >>>
>> >>> All the best,
>> >>>
>> >>> LRP
>> >>>
>> >>
>> >> I think that it is great to see everyone talking about wanting to
>> improve
>> >> the documentation. The contributions to the Erlang/OTP project that I
>> value
>> >> that most are documentation changes that make the intention clearer, or
>> >> explains some corner case somewhere which the docs did not initially
>> >> mention.
>> >>
>> >> Unfortunately, once one has figured out how a function works there
>> seems to
>> >> be very little incentive to make the docs clearer. I would estimate
>> that
>> >> about every 20th pull request we get is a documentation fix, and more
>> than
>> >> half of those are fixes of speling misstakes (which are great!).
>> >>
>> >> I've just come back from about two weeks of vacation and this
>> discussion has
>> >> resulted in roughly 0 pull requests for changes in the documentation.
>> Would
>> >> it be possible to steer this discussion into doing something instead of
>> >> talking about doing something? Yes the technology/layout is not
>> perfect, but
>> >> as Loïc said, it is the content that matters the most.
>> >>
>> >> Lukas
>> >> // my own oppinions
>> >>
>> >> _______________________________________________
>> >> erlang-questions mailing list
>> >> erlang-questions@REDACTED
>> >> http://erlang.org/mailman/listinfo/erlang-questions
>> >>
>> >
>> >
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160927/5fe2501e/attachment.htm>