[erlang-questions] Erlang documentation -- a modest proposal

Tue Sep 27 13:11:41 CEST 2016

Joe, check out this startup - https://www.simpla.io/

They allow editable web pages, which creates blobs which get saves directly
into a database.  No more digging through source files to edit docs, people
(with appropriate permission) can edit the page live. It would be very
straightforward to write a program to dump from the database to XML/JSON or
whatever format you're interested in, making it portable, and fairly
trivial to go the other way. (Full disclosure: It's not my startup but
they're my mates, we went through the same accelerator).

Is the raw XML documentation available anywhere?

Thanks,
Luke

On Tue, Sep 27, 2016 at 6:03 AM, Joe Armstrong <erlang@REDACTED> wrote:

> Regarding the documentation - there is a step0 that needs to be done
> *before* attacking the document improvement problem.
>
> The following things are needed (I my opinion)
>
> 1) The number of DTDs in the documentation should be reduced to ONE
>      there are currently 26 these are in
>        otp_src_<vsn>/lib/erl_docgen/priv/dtd
>
> 2) tags that are used infrequently should be removed and the source XML
>     corrected. Some tags are virtually *never* used.
>
> 3) All XML files should validate with the new DTD
>
> These steps need quite a lot of work ...
>
> 4) The Erlang parser should be changed to  exactly
> reproduce the source.
> Right now the parse tree of correct erlang has all the comments
> and white space removed. I'd suggest attaching the comments to the
> next following token (for example {atom,Line,theAtom} should become
>     {atom, Line, theAtom, "the preceding comments and white space"}
> It should be possible to *exactly* reconstruct the input from the parse
> tree.
>
> <aside> - in the first erlang all the different ways of writing an integer
> ended up as the same token. So writing 16#fc was the same as writing the
> integer 252 and tokenized as {integer,Line,252} - the tokenizer threw
> away the exact input so it was impossible to reconstruct the source
> from the token stream. Now it's better the 16#fc is tokenized as
> {integer,[{location,{Line,Col}},{text,"16$fc}], 252} - but comments
> and white space are not
> retained in the parse tree.
>
> Note: that change the parse tree is *not* a simple hack - all tools that
> depend upon the parse tree have to be changed.
> </aside>
>
>
>
>
>
> 5) We should decide how to attach "floating" comments in the source.
> Does a comment *before* a function apply to the next function or not?
>
> 6) We need some "injection" API to inject code, meta-data, examples
> and documentations into a data base.
>
>    For example   inject:code("foo.erl") should inject a load of key/value
> pairs into a data base, with something like the following keys
>
>      {text, Mod,Func,Arity} => the source code text
>      {spec, Mod, Func, Arity} => the spec
>      {doc, Mod,Func,Arity} => the documentation
>      {examples,Mod,Func,Arity} = [Examples]
>
> The entities in the database should be sufficient to reconstruct the
> original text, and perform various analysis of the functions.
>
> I think *most* of the problems involved are due to the difficulty of
> extracting information from the source files and editing this when it is
> wrong.
>
> I'm currently trying to do parts of this.
>
> A "relatively simple" program should be able to (for a given function)
>
>     - find the exact source text
>     - find all old versions
>     - find the specs and types referred to
>    - find the documentation
>    - find the test cases
>
> Doing so involves analysing the erlang sources the XML sources and the
> test cases, and involves a deal of guess work.
>
> All this must be done on a moving target and should not break the
> existing system.
>
> I suspect that the code to accurately manipulate the code and
> documentation have been has been written several times in different
> projects (for example
> the wrangler, and the Eclipse interface) both need to manipulate the source
> in various ways.
>
>
>
>
> On Mon, Sep 26, 2016 at 6:41 PM,  <lloyd@REDACTED> wrote:
> > Joe,
> >
> > You've said so well what I've been trying to harp on.
> >
> > My most recent timesink has been trying to understand xmerl sufficiently
> well to pull book data out of several different book APIs. Dave Thomas's
> 2007 tutorial has been a big help, but the black holes in my understanding
> still significantly impede my progress. So far I've spent maybe 10 to 15
> hours trying to scope it out. I can get much of what I need from Amazon's
> APIs, but I need a redundant source. The Library of Congress API completely
> eludes me; I get a little further with ISBNdb, but still not far enough.
> >
> > Given discussion on the documentation thread to date, it seems to me
> that there are four issues at stake:
> >
> > 1) Content deficiencies
> > 2) Formatting issues
> > 3) Lack of consensus of what we, as a community, want
> > 4) How we move forward toward comprehensive improvement of documentation.
> >
> > Lukas Larsson's most recent post makes a good point.
> >
> > Bruce Yinhe tells me in a private post that his group is about to hire
> one person on a part-time basis to  work on documentation improvements.
> >
> > I've lost it in the thread, but as I recall we had some promising
> interest in documentation improvements from an Erricson employee.
> >
> > It would be great if we could begin to rally around these comments and
> find some kind of convergence toward progress.
> >
> > My take is to break the large task down into small chunks, bring the
> intelligence and resources of the community to bear on one specific issue
> at a time and, and get it done.
>
> I think many (most) of the problems arise because what we are ultimately
> doing is changing the content of a file at some place.
>
> Fixing a typo/bug  involves
>
>    1- finding the appropriate file
>    2 - changing the file at the appropriate place
>    3 - updating the file (somehow)
>    4 - generating the downstream documents that depend upon the file
>
> All these steps are difficult
>
> We can imagine a simpler way:
>
> Suppose a file is a sequence of paragraphs. Each paragraph
> has a GUId
>
> In (say) HTML
>
>   <p   guid = "b92a2705-3449-4fb9-8f11-fa55f7ead29f">
>      This is my paragarph ...
>   </p>
>
> If I want to update the paragraphs I just send a message
>
>     {update,"b92a2705-3449-4fb9-8f11-fa55f7ead29f"
>        "the new content"}
>
> to some server - this should be checked (manually) and then
> if approved used to update everything.
>
> In other threads I have argued that *everything* should be in a global
> database with a huge DHT tracking where things are.
>
> A key to "changing things" is "naming things" and "finding things"
>
> yes another (even simpler) alternative is to have a message
>
>     {change,SHA,NewText}
>
> Meaning "change the paragraph with sha1 checksum <SHA> to <NewText>"
>
> Implementing this is easy - BUT all paragraphs with the same SHA
> would be changed - which might not be what we want.
>
> I have (incidentally) experimented with this - tagging all paragraphs
> with their SHAs and sticking the results in a database.
>
> <crazy idea>
> make a server that accepts messages of the form
>
>    {change, <SHA>, <New Text>}
>
> The server finds a paragraph with sha1 checksum <SHA> and changes
> it to <New Text> - it changes the appropriate file in a GIT archive
> does all the /add/commit/push magic and the job is done.
>
> (I think I'll implement this for fun :-)
>
> </crazy idea>
>
> >
> > I don't have the technical chops, but I'll gladly work with you or
> anyone else to address content issues on a module-by-module basis. I can
> ask Micky-the_Dunce questions and, perhaps, help clarify language. It would
> be great if you could help clarify intent and application issues.
> >
> > All the best,
> >
> > Lloyd
> >
> >
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: "Joe Armstrong" <erlang@REDACTED>
> > Sent: Monday, September 26, 2016 6:42am
> > To: "Lukas Larsson" <lukas@REDACTED>
> > Cc: "Lloyd R. Prentice" <lloyd@REDACTED>, "
> erlang-questions@REDACTED" <erlang-questions@REDACTED>
> > Subject: Re: [erlang-questions] Erlang documentation -- a modest proposal
> >
> > I think what I miss most are *examples*
> >
> > I've just been reading the edoc manual pages for a program
> > called <XYZ> (name changed to avoid embarrassment)
> >
> > The functions are well documented - they types are well documented
> > but I haven't a clue about which ORDER to call the functions
> >
> > Imagine a file system.
> >
> > We *document* the open, read, write, and close functions
> > but we don't say you have to open the file before we read it.
> > We dont say when we're done we have to close the file.
> >
> > We don't say this because it is *obvious*
> >
> > But for the module glonk, which exports, zizzle, taddle, glonk and plonk
> > it is NOT obvious. Yes sure you all know you have to call glonk 3 times
> > before calling plonk - but I don't know.
> >
> > Thats why we need examples.
> >
> > Often I search for a tutorial and find a ten line blog posting that
> > actually shows me how to use a library - this gets me started.
> >
> > very short unit tests - placed inline are *very* useful
> >
> > for example:
> >
> >      "321" = lists:reverse("123")
> >
> > The unit test *are* the examples - what we don't have is software that
> > parses the code, parses the documentation, parses the unit test
> > and munges all together into a form that is convenient to read.
> >
> > I'm actually trying to write something like this now - hence my wails
> > of anguish over css.
> >
> > Wish me luck
> >
> > /Joe
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Sep 26, 2016 at 11:58 AM, Lukas Larsson <lukas@REDACTED>
> wrote:
> >> Hello,
> >>
> >> On Thu, Sep 22, 2016 at 11:56 PM, Lloyd R. Prentice <
> lloyd@REDACTED>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> To date, this thread has generated quite a few worthwhile insights and
> >>> ideas. My fear is that they will be deep-sixed into the archive. On the
> >>> other hand, major revision is a daunting task and unlikely to happen.
> >>>
> >>> But maybe we can focus on specific issues and make iterative headway.
> >>>
> >>> Fewer than half of the functions in the lists library, for instance,
> have
> >>> code examples. Suppose over the span of one week we were collectively
> focus
> >>> on generating at least two code examples for each function in one
> library.
> >>>
> >>> At the end of the week we could organize the submissions and vote on
> best
> >>> candidates for inclusion in the docs. That done, we can pick another
> module.
> >>>
> >>> Thus, with not much effort from any one individual, a small posse of
> >>> volunteer Erlang wizards could make short work of deficiencies in the
> docs.
> >>>
> >>> Anyway, it's an idea.
> >>>
> >>> All the best,
> >>>
> >>> LRP
> >>>
> >>
> >> I think that it is great to see everyone talking about wanting to
> improve
> >> the documentation. The contributions to the Erlang/OTP project that I
> value
> >> that most are documentation changes that make the intention clearer, or
> >> explains some corner case somewhere which the docs did not initially
> >> mention.
> >>
> >> Unfortunately, once one has figured out how a function works there
> seems to
> >> be very little incentive to make the docs clearer. I would estimate that
> >> about every 20th pull request we get is a documentation fix, and more
> than
> >> half of those are fixes of speling misstakes (which are great!).
> >>
> >> I've just come back from about two weeks of vacation and this
> discussion has
> >> resulted in roughly 0 pull requests for changes in the documentation.
> Would
> >> it be possible to steer this discussion into doing something instead of
> >> talking about doing something? Yes the technology/layout is not
> perfect, but
> >> as Loïc said, it is the content that matters the most.
> >>
> >> Lukas
> >> // my own oppinions
> >>
> >> _______________________________________________
> >> erlang-questions mailing list
> >> erlang-questions@REDACTED
> >> http://erlang.org/mailman/listinfo/erlang-questions
> >>
> >
> >
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160927/0a243c06/attachment.htm>