\documentclass{slides}
% Nothing fancy needed, just straight-out-of-the-box LaTeX.
\begin{document}
\begin{slide}
\begin{center}\large
Erlang/OTP User Conference 2000
\end{center}
\vskip 1cm
\begin{center}\Large
An Erlang DTD
\end{center}
\vskip 2cm
\begin{center}\large
Richard A. O'Keefe\\
CS, University of Otago
\end{center}
\end{slide}
\begin{slide}
Erlang/SGML is not an SGML parser for or in Erlang.
If you want one, parse the ESIS output of an `sgmls' or
`nsgmls' process (dead easy).
Or you could
pick up Jan Wielemaker's sgml package for SWI Prolog
(the parser core is in C; it calls foreign interface
code to build Prolog data structures) and adapt that
(considerably faster than nsgmls but less capable).
Erlang/SGML is not an XML parser either (I've done one 10-20\%
faster than expat in 950 lines of C).
\end{slide}
\begin{slide}
Erlang/SGML is not primarily a literate programming
tool for Erlang. Unlike most LP tools, it cannot generate
multiple source files from a single document. Currently it
does not handle fragments, which Erlang has little need of.
It \emph{does} handle cross-referencing.
Noweb, Nuweb, and Funnelweb work perfectly well with
Erlang, so we don't need a new LP tool for Erlang.
If we did want one, there's Kristina Sirhuber's YERL.
\end{slide}
\begin{slide}
It's all about document indexing to support the
maintenance process.
{\large
Hypothesis:
\begin{quote}
Maintenance programmers seeking information or
trying to figure out the consequences of a change
have to find related documents.
Improving that step should help maintenance.
\end{quote}}
\end{slide}
\begin{slide}
Isn't ``maintenance'' a rather old-fashioned way
of thinking about a neat fast-development language like
Erlang?
No. Faster development means that the program is useful
sooner, which just means that the maintenance phase is longer.
(``Longer'' is not the same as ``more expensive''.)
Extreme Programming involves continuing redesign,
which
absolutely requires good tools for finding related places
in related documents easily and quickly.
\end{slide}
\begin{slide}
Erlang/SGML is part of the Large Scale Erlang project.
Approaches that work well for 300 kSLOC do not scale to
100 MSLOC.
One major requirement is a water-tight module system.
(I offered two papers this year; that would have been number 3.)
Another requirement is more contextual information, so that
far less human filtering is required.
See the ``SIF'' problem later.
\end{slide}
\begin{slide}
What are SGML and XML anyway?
SGML is a language in which one may define document grammars,
with rules like
\begin{verbatim}
\end{verbatim}
\end{slide}
\begin{slide}
Abstractly, a document is a tree, where each node is either a
text node or a labelled node; a labelled node has a label,
a set of attribute=value pairs, and a possibly empty sequence
of children. Nodes may be given unique identifiers, which are
used for cross-references within a tree.
Developing a document grammar is basically an exercise in
information modelling, rather like the conceptual modelling
phase of the Unified Process.
There are several semi-formal models of the information content
of a document. The SGML standard has Element Structure Information Sets;
HyTime and DSSSL have Graph Representations Of property ValuEs.
XML has several not quite mutually compatible
ones, but is like SGML.
\end{slide}
\begin{slide}
\begin{tabular}{ll}
SGML&
XML\\
\hline
writable&
need machine help\\
readable&
need machine help\\
concise&
amazingly bulky\\
standard, stable&
W3C, grows fast\\
& (Namespaces, XBase, \dots)\\
tricky to parse&
trivial to parse\\
easy to process&
harder to process\\
\hline
\end{tabular}
The Erlang SGML DTD and the Erlang XML DTD both include the
same grammar file; the SGML version enables human readability
features such as tag omission and short reference strings.
Erlang source: 39k; stripped: 21k;\\
as SGML: 43k; as XML: 104k.
\end{slide}
\begin{slide}
Erlang/SGML expresses
\begin{itemize}\parskip=0pt\topsep=0pt
\item text (based on HTML with strong influence from
TEI, but reasonable restraint)
\item tables (based on HTML4 but not as powerful)
\item mathematics (not ISO 12083, not HTML 3.0, most certainly
not MathML which is not meant for human beings, but
home brew; \emph{nearly} as concise as \LaTeX but stricter)
\item pictures (images via Encapsulated Postscript, diagrams via Pic)
\item \dots continued \dots
\end{itemize}
\end{slide}
\begin{slide}
Erlang/SGML expresses
\begin{itemize}\parskip=0pt\topsep=0pt
\item Erlang source code (enforces all syntactic constraints
except for what's allowed in guards; could do that too)
\item glossaries
\item indices
\item Dublin core metadata
\item examples
\end{itemize}
\end{slide}
\begin{slide}
Status
\begin{itemize}\parskip=0pt\topsep=0pt
\item Erlang source code $\rightarrow$ markup:\\
done in Prolog. Complete.
\item markup $\rightarrow$ Erlang source code:\\
done in Prolog. Complete.
\item parser:\\
SP `nsgmls' (a wee bit too complex for `sgmls'), SWI `sgml'.
\item SGML to XML conversion:\\
done by `nsgmls'
\item document searching:
LT XML toolkit (good) or sgrep (poor).
\item \dots
\end{itemize}
\end{slide}
\begin{slide}
\dots Status:
\begin{itemize}\parskip=0pt\topsep=0pt
\item editing:\\
by hand (own editor) or Emacs SGML mode (psgml).
Free XML editors are disappointing.
\item formatting:\\
patchwork of AWK scripts; to be redone in Prolog.
\item manual:\\
in \LaTeX; woefully incomplete but slowly growing.
\end{itemize}
Why Prolog? Because I have Prolog on my home Macintosh but not Erlang.
Also because of SWI Prolog/SGML kit.
Should migrate to Erlang when complete.
\end{slide}
\begin{slide}
The context problem: is `SIF'
\begin{itemize}\topsep=0pt\parskip=0pt
\item the name of a Goddess?
\item part of a word ('SIFTER' perhaps)?
\item a file extension (Smalltalk Interchange\\Format)?
\item some other acronym?
\item a tag used inside some data structure?
\item a module name?
\item a function name without its module\\and arity?
\item a type name?
\end{itemize}
If I'm looking for references to a module, I do not want to be
distracted by Norse goddesses.
\end{slide}
\begin{slide}
The context solution.
Use a document grammar which is based on Erlang syntax but
enriches it with information people need and compilers don't.
\begin{tabular}{ll}
$<$name myth/SIF/ &goddess\\
$<$text/SIF/ &text fragment\\
$<$atom u=ext/SIF/ &file extension\\
$<$acr/SIF/ &acronym ($\rightarrow\,<$glossary$>$)\\
$<$atom u=tag/SIF/ &data structure tag\\
$<$modname/SIF/ &module name\\
$<$funcname/SIF/ &local function name\\
$<$funcname m=x/SIF/ &imported function name\\
$<$typename/SIF/ &type name\\
$<$pidname/SIF/ &PID name
\end{tabular}
\end{slide}
\begin{slide}
$<$name$>$, $<$acr$>$, $<$abbr$>$. $<$modname$>$,\\
$<$funcname$>$, $<$patname$>$, $<$pidname$>$ and
$<$typename$>$ already existed for use in text; $<$text$>$ already
existed for use in mathematical formulas. Allowing them to be
used where an otherwise unclassified $<$sym$>$ would have been used
was easy.
There is no way to predict all possible uses for an atom.
\begin{verbatim}
\end{verbatim}
says that the `u' attribute of a $<$sym$>$ can be any name.
\end{slide}
\begin{slide}
$<$text$>$ normally allows all sorts of markup inside it including
mathematical formulas; that doesn't make a lot of sense in atoms.
SGML lets us write
\begin{verbatim}
\end{verbatim}
The exclusion here says that certain tags are not allowed anywhere
inside an $<$expr$>$ element even if the rest of the grammar (such as
the definition of $<$text$>$ says that they are. XML does not have
exclusions, more's the pity (XHTML really needs them).
\end{slide}
\begin{slide}
What a system can't figure out, it can be told.
The u= attribute lets us tell Erlang/SGML how an atom is being used.
If a system can figure something out, it should.
This is very like type checking, but it goes beyond current Erlang
type systems. It should be possible to automatically propagate
usage information.
But no type checker will analyse the Erlang atoms that appear in the
explanatory text, and we do need to find them during maintenance.
\end{slide}
\begin{slide}
Nothing in Erlang syntax expresses a relationship between documents.
Directives such as
\begin{verbatim}
-module(this).
-export_to(that, [a/1,b/2]).
-import(the_other, [c/3]).
\end{verbatim}
express relationships between \emph{this} document and some modules.
The links are completed at run time by loading files.
\end{slide}
\begin{slide}
No official annotations express relationships between source files
and other documents such as standards, requirements, test plans,
user documentation, you name it.
There are not even any clear suggestions about how to use
-author -vsn and so on.
There \emph{is} an official set of document annotations for cataloguing
and indexing purposes: the Dublin Core.
\end{slide}
\begin{slide}
Dublin Core p1
\begin{itemize}\parskip=0pt\topsep=0pt
\item $<$identifier/unambiguous formal reference for this resource/
\item $<$title/Name by which humans know this resource/
\item $<$subject/keywords,key phrases,ACM codes,what about,for indexing/
\item $<$description/Reasonably full description\\
(lengthy abstract)/
\item $<$coverage/scope, {\it e.g.,} which standards\\
supported/
\item $<$date/yyyy-mm-dd/\\
When this resource became available
\end{itemize}
\end{slide}
\begin{slide}
Dublin Core p2
\begin{itemize}\parskip=0pt\topsep=0pt
\item $<$language/en\verb|_|NZ/\\
one for each language used; I extend this
\item $<$type/software/\\
but useful in descriptions of other files
\item $<$format$>$text/SGML$<$/format$>$\\
but useful to describe other files
\item $<$creator/Repeat with name of everyone\\
taking major part in creation (-author?)
\item $<$contributor/Repeat with name of each\\
contributor (minor maintenance?)/
\end{itemize}
\end{slide}
\begin{slide}
Dublin Core p3
\begin{itemize}\parskip=0pt\topsep=0pt
\item $<$publisher/Who caused this to be released/
\item $<$rights/State or cite rights held---\\
like -copyright but more possibilities/
\item $<$source/unambiguous formal reference for base of derivation/
\item $<$relation/unambiguous formal reference to related document/
\end{itemize}
\end{slide}
\begin{slide}
I propose adding a new directive to Erlang:
-dc(Attribute, Value[, Qualifier(s)]).
\begin{verbatim}
-dc(creator, "Karl Marx", [{type,architect}]).
-dc(creator, "Groucho Marx", [{type,coder}]).
-dc(contributor, "Greasy Marks").
-dc(rights, "Copyright (c) 2001 FuBar Ltd").
-dc(rights, "See licence.txt").
\end{verbatim}
\end{slide}
\begin{slide}
The difficult thing about doing this is
\begin{quote}
studying all the relevant literature to glean techniques and ideas.
\end{quote}
\begin{itemize}
\item Should paragraphs be like HTML or TEI? (TEI)
\item What should tables be like?
\item How to express mathematics in SGML so that a human can type formulas
with useful syntax checking but without dying of exhaustion?
(I have used a point-and-click equation editor and it \emph{stank}.
Even typing MathML by hand was faster, which says a lot.)
\item How best to use the SHORTREF feature to make typing Erlang easy?
\end{itemize}
\end{slide}
\begin{slide}
SGML lets you attach macros to strings contextually. So
\begin{itemize}\parskip=0pt\topsep=0pt
\item \verb|"|abc\verb|"| $\rightarrow$\\
$<$str$>$abc$<$/str$>$
\item \verb|'|abc\verb|'| $\rightarrow$\\
$<$sym$>$abc$<$/sym$>$
\item \verb|['|a\verb|','|b\verb|','|c\verb|']| $\rightarrow$\\
$<$elst$><$sym$>$a$<$/sym$><$expn$>$\\
$<$sym$>$b$<$/sym$><$/expn$><$expn$>$\\
$<$sym$>$c$<$/sym$><$/expn$><$/elst$>$ in expressions
\item \verb|['|a\verb|','|b\verb|','|c\verb|']| $\rightarrow$\\
$<$plst$><$sym$>$a$<$/sym$><$patn$>$\\
$<$sym$>$b$<$/sym$><$/patn$><$patn$>$\\
$<$sym$>$c$<$/sym$><$/patn$><$/plst$>$ in patterns
\end{itemize}
Makes Erlang code much more readable and writable; not available in XML.
\end{slide}
\begin{slide}
Erlang/SGML has type declarations and annotations.
When generating Erlang source code, they can simply be ignored.
Erlang/SGML has protocol declarations; a protocol is a data type for
a process's message queue elements (as in OCCAM). Purpose is
documentation and maintenance. When generating Erlang source code,
they are ignored. When you want to know what P!\{frazzle,X\} means,
it's nice to have
\begin{verbatim}
...
@P,{'frazzle',@X}
\end{verbatim}
so we know at once where to look for the documentation.
\end{slide}
\begin{slide}
For references to documents, file names are too fragile, and in practice
URIs are too. SGML's Formal Public Identifiers are a time tested way to
provide stable names for documents. Catalogues provide local mappings.
\begin{itemize}
\item[79] formal public identifier = owner identifier, \verb|"//"|,
text identifier
\item[80] owner identifier = ISO owner identifier $|$
registered owner identifier $|$\\ unregistered owner identifier
\item[82] registered owner identifier = \verb|"+//IDN "|, domain name
$|$ \verb|"+//"|, other data
\item[83] unregistered owner identifier = \verb|"-//"|, data
\end{itemize}
\end{slide}
\begin{slide}
\begin{itemize}
\item[84] text identifier = public text class, \verb|" "|,
unavailable text indicator?, public text description, \verb|"//"|,
public text language designation,
(\verb|"//"|, public text display version)?
\end{itemize}
{\it E.g.,} \verb|"+//IDN cs.otago.ac.nz//DOCUMENT|\\
\verb|Erlang/SGML report//EN"|
FPI's are text telling people who the owner is and what the document
is called; ask them where it is, update your catalogue.
FPI's may also be internet URN's.
\end{slide}
\begin{slide}
Kristina Sirhuber found that Ellemtel and Uppsala people
\begin{itemize}\parskip=0pt\topskip=0pt
\item Did not like the tangling step.\\
Not a problem: an Erlang compiler could work straight from the
Erlang/SGML source (no reordering or fragments in the Erlang DTD)
\item Did not like the idea of having to learn yet another language\\
\emph{is} a problem; if you want to produce well documented programs
you \emph{have} to learn a language other than Erlang. But the
Erlang manual set is written using SGML, and it's rather simpler than
\LaTeX.
\end{itemize}
Of course, ``Bird tracks'' would be even simpler, but would not solve
the context problems.
\end{slide}
\end{document}