[erlang-questions] Visual Erlang notation v0.1.0 - feedback request

Tue May 6 15:28:18 CEST 2014

Hi Richard,

Among many thoughtful comments and contributions, you've touched on a MAJOR documentation meta info oversight in all too many open-source projects: 

>       WHAT        does this function do?
>       HOW        does it do that?
>       WHY        does it exist?

WHAT does this project do? 
HOW does it do that? 
WHY does it exist? (E.g., what problem does it solve?)

The underlying question is, how much can the documentation author assume about the intended reader's level of knowledge? Too many authors assume far too much about the reader and his/her level of knowledge.

We're all noobies at one time or another with respect to one software technology or another. And, as a life-long charter-member noobie, I can't tell you how many worthy open-source projects I've turned away from because I couldn't understand what in the hell was going on or why?

From deep in  Noobieville, I can't say how much I appreciate the fine ideas and thinking emerging in this thread.

Best wishes,

LRP 

-----Original Message-----
From: "Richard A. O'Keefe" <ok@REDACTED>
Sent: Tuesday, May 6, 2014 1:22am
To: "Torben Hoffmann" <torben.hoffmann@REDACTED>
Cc: "Erlang-Questions Questions" <erlang-questions@REDACTED>
Subject: Re: [erlang-questions] Visual Erlang notation v0.1.0 - feedback request

On 5/05/2014, at 11:11 PM, Torben Hoffmann wrote:
> ok@REDACTED writes:
> 
>> I am viewing visual_erlang.pdf in Acrobat Professional
>> and most of the figures are missing the connecting lines
>> you'd expect to find in them.
>> 
> That is most peculiar - I am using Skim, but opening in the free Adobe Reader shows
> no signs of errors.
> 
> Are there any log files that states anything about my PDF being bad?

On my laptop, using Adobe Professional, the lines weren't there.
On my desktop, using Preview, they are.
Presumably a PDF version issue.

Re figure 10:  "Module my_mod has API f/N".
Eh?  "Application *Programming* Interface" (as opposed to
"Application *Binary* Interface) is the whole set of source-
level accessible operations provided by something.
"f/N" is (a name for) a SINGLE FUNCTION.
Why use three obscure and arguably inappropriate syllables
"eh pee ai" when two clear appropriate syllables "funk shn" will do?
The "API" of a module (if we want to embrace that ugly and
confusing term at all) is the *whole* export set of the module.

The text says "a diamond head from the module to the function"
but figure 10 shows a connector that I cannot perceive as
anything but pointing from the function to the module.
Given that modules and functions have different kinds of names
and different kinds of boxes, it's not clear that the direction
of the link needs to be distinguished in yet a third way.

As I shall argue shortly, we may need to include functions
in a diagram that are NOT exported from a module.

Figure 11 says "Module my_mod has instance P1" but while
parametric modules do have instances, those instances are
modules, not processes.  No process is, and no process
can be, an *instance* of a module.  On the other hand, it
is fair to say that a module *creates* a process.  So
some kind of

	++========++  «creates»   _______
	|| module || ----------> (process)
	++========++              ~~~~~~~

makes sense.  (Note that the information flow is *from* the
module *to* the process, so a data flow arrow goes this way
around.)

Amongst other things, for X to be an instance of Y, it
has to be a special case of the *whole* of Y.  But if a
module contains two 'spawn' calls, how can that be?
(This is in marked contrast to classes, where an object
*is* an instance of the *whole* class; every part of the
class is equally relevant to it.)

I am not arguing against showing a relationship between
a module and a process, just saying that the kind of
relationship isn't recognisably an "instance" relationship.

Actually, processes are NOT created by modules,
processes are created by functions in modules.
So we have

	++========++  is-part-of  +----------+  «creates»   _______
	|| module || <----------- | function | ----------> (process)
	++========++              +----------+              ~~~~~~~

This distinction matters when a module contains more than one
function that calls spawn.  In this case, it is quite likely
that the function that calls spawn will not be exported.

We need to distinguish between a process p that is
registered under the name 'p' and a process P that is
just some process we're labelling in this diagram.

And that means that using P1:my_mod is doubly worrisome:
P1 is a *creation* of my_mod, not an *instance* of it,
and my_mod is *not* a process registered under the name
'my_mod'.

We turn to figure 14: "Process P1 has API f/n".
In what sense does a process "have" a function?
Consider a process P1 spawned in module M1 which
exports f/n, but where P1 is currently executing
in module M2, which also export f/n.  Which f/n
does P1 "have"?  I honestly cannot understand
this.  When would you use it?

Oh, figure 13 shows a ModuleProcess.  What the heck
is a ModuleProcess?  (Sounds a bit like an instance
of a synchronized class.)

Staring hard at figure 15 and reading the text nearby
over and over until the cotton wool is bursting from
my head, I *think* figure 15 is supposed to mean

	module my_mod exports f/N and g/N.
	It creates a process P1 by some unspecified means.
	If you call f/N it will ignore P1 (although this
	is an argument from silence).
	If you call g/N it will do something with P1;
	it might send a message, it might kill it
	using erlang:exit/2, it might monkey with its
	flags using erlang:process_flag/3, it might
	suspend or resume it, I'm *not saying*).

Is "a call to m:g/N may produce an effect in P" what
is meant by "P _has_ g/N"?

>> So for me one of the essential feature of any "boxes and
>> connectors" notation is that there has to be a defined
>> way to map that to some sort of set of facts that I can
>> write queries on.  That is, it has to be more than pretty
>> pictures.
>> 
>> Turtle (RDF) is fine; SWI Prolog comes with a decent kit for
>> loading/exporting/storing/querying RDF.  Datalog is fine too.
>> It might be OK to make it something that can be visually
>> manipulated in 'dia', and define a mapping between dia
>> file format and a semantic form.
>> 
>> Thing is, *IF* this (or any other notation) is any good at
>> revealing just enough relevant structure, then it ought to
>> be possible to write "flaw detectors" to spot antipatterns.
>> 
> I'm certainly for this kind of formalism, but I am not able to do it since my
> knowledge in this area is non-existent.

You do not need to understand any particular formalism.
Actually, that was the point of me mentioning several.
What you need to have is a SEMANTICS which involves
specifying the types of concepts and the relationships
between them.  If you know how to use a relational
data base, *that* will do.

Given that you have a diagram "process P has function f/N",
represent that as
	process_function_arity (process_name, function_name, arity)
	                        'P',          'f',           N

> Do you have pointers to some introduction texts that can help guide the design of the
> textual notation?

Ah, but I'm not talking about a textual notation!
I'm asking for a *semantics*.
I'm asking that a graphical notation should have
an *abstract* equivalent as a set of facts (which 
could be RDF triples or rows in a relational data
base or ...) and the names and meanings of the
predicates in this abstract equivalent should be
spelled out.  How those facts are turned into text
(if indeed they ever are) is comparatively unimportant.

Let's take figure 15.

	module(my_mod).
	function('f/N', f, N).
	process_var('P1').
	function('g/N', g, N).
	instance('P1', my_mod).
	has(my_mod, 'f/N').
	has('P1', 'g/N').

Here we have facts like "module(Atom)" -- Atom is the name of a
module -- and "process_var(Atom)" -- Atom is a variable name
local to the current diagram which stands for a process and
"function(Atom1, Atom2, Integer)" -- Atom1 is an arbitrary
atom labelling a function box, Atom2 is the name of that
function, and Arity is its arity" and "instance(Atom1, Atom2)"
-- Atom1 is either a process node name or a process_var node
name and Atom2 is a module name and that module creates that
process-- and so on.

It is quite unimportant whether it's written like that,
or like
	(define-node '|my_mod| :type module)
	(define-node '|f/N|    :type function)
	(define-node '|g/N|    :type function)
	(define-node '|P1|     :type process-var)
	(instance '|P1| '|my_mod|)
	(has '|my_mod| '|f/N|)
	(has '|P1|     '|g/N|)
or like
	<ve>
	  <module id="my_mod"/>
	  <function id="f/N"/>
	  <function id="g/N"/>
	  <process-var id="P1"/>
	  <instance id="P1" of="my_mod"/>
	  <has id="my_mod" funs="f/N"/>
	  <has id="P1" funs="g/N"/>
	</ve>

Given any of these, it's trivial to produce the other two.
What matters is the *semantics* these express.

A human-readable textual notation is something else again,
satisfying different design criteria.
> 
> What I have been missing, and what drives me to create Visual Erlang, is a way to
> capture how the supervision tree interacts with the functionality of the
> module/processes.

An example that illustrates precisely that would be create.
> 
> Apart from being able to talk about what a process should be responsible for I am
> also aiming at getting to a point where I can interactively browse through a code
> base and uncover the architecture, while deciding which parts to hide through
> abstractions. That is a bit down the road, though.

I have been day-dreaming about a paper for an Erlang conference
with the working title "Why can't I see the structure?".

Let me briefly mention some annotations I've started using
with Smalltalk:

	<compatibility: #DIALECT>

	This method exists for compatibility with DIALECT.
	Don't blame *ME* for the interface.
	For reasons and examples, read their documentation.

	<specialCaseOf: #SELECTOR>

	This method isn't really necessary; you could get
	the same result by using #SELECTOR but with more
	effort.  If you're trying to understand this class,
	ignore this method for now.

	<compositionOf: #SELECTOR and: #SELECTOR2
		[for: #time|#space|#safety]>

	This method gets the same effect that combining
	SELECTOR and SELECTOR2 in the obvious way would.
	It might save time, save memory, or be legal when
	for technical reasons the separate parts wouldn't be.
	If you are trying to understand this class,
	ignore this method for now and look at the other two.

	<supportFor: #SELECTOR>

	This method isn't meant for normal use; it exists
	to support SELECTOR.  If you want to understand
	this method, look at SELECTOR first.

This is in addition to the grouping of methods into "categories"
that is traditional in Smalltalk.

When I look at a function in a module, I need to know

	WHAT	does this function do?
	HOW	does it do that?
	WHY	does it exist?

The thing that is systematically missing from things like UML
is the WHY.  I find that while UML _could_ tell me a great
deal, in practice it _doesn't_ tell me anything I couldn't
get from a cross-referencer.

So if I am staring at a diagram showing a bunch of processes,
I want to know WHY they are there.  If there are two processes
with a third managing them, is work being sent to them both,
is one a hot standby for the other, are they using different
algorithms so that one is serving as a cross-check for another,
is one a high security version and one a low security version,
or what?

Arguably, when I know WHY processes exist, I should be able
to figure out *whether* they should be linked.

> Funny that you bring this up - there has just been an application for an EU grant
> (CoCo) to look at communication contracts between processes, which is very much about
> building on top of UBF and get something that captures the protocols in the same way
> specs capture types today.
> 
> I'd love to get protocols covered as well, but I don't feel that squeezing them into
> Visual Erlang is the right approach.

Look, instead of

	(Process_Var : module_name)

the only change you need is to use

	(Process_Var : protocol_name)

instead.  Protocols really *are* "types for processes".
The bulk of the information about the protocol could be in some
other document, but *some* identification of the protocol
belongs in the process box.

There's arguably something else.

UBF has the notion of a process being in a *state*.
For that matter, Microsoft's Sing# language for the
Singularity operating system has precisely the same
notation built into its core syntax for channels.

It isn't just instances of gen_fsm that have states.

Maybe you could have

          (process)
         /         \
<|state|>           <|state|>

or something.  The way OPM puts states inside objects is a royal
road to unreadability for non-trivial diagrams.

> I have stolen some of the ideas from Object-Process Methodology (OPM).

Hmm.  The only thing I find more off-putting than "Holistic"
in a book title is "Wholistic".

If you thought OPM was a good starting point, then the idea of
displaying UBF states in an Erlang notation should appeal.

> Hopefully the CoCo project will shed some light on this.

Well, there's a pretty strong glimmer right here.

In UML, you have |instance name: class name|.
The class name is *NOT* the name of whatever it is
that creates the instance and it is NOT the name
of the package that contains the 'new' expression.
The logical application of that to Erlang is
(process name: protocol name) as suggested above.
protocol : process :: interface : object.

I don't suppose the CoCo project want a long-distance collaborator (:-).

There are at least three purposes a graphical notation may serve:

(1) Design.

    For this, it is important that it should be very easy to
    *change* a diagram and there should be good tools to *check* it.

(2) Code generation.

    Given a sufficiently detailed set of diagrams, you might
    want to generate (skeletal) code.  This absolutely requires
    that the notation SCALE both up (to 100,000 line systems at
    least) and down.

(3) Documentation.

    What this requires above all is readability.

    A central limitation on readability is that visual
    diagrams just *cannot* hold very much information.
    And *that* means that you have to have a multitude
    of overlapping diagrams and need good tools
    for flicking through dozens if not hundreds of
    diagrams, being able to specify what to look at by
    something information retrieval-ish and by
    something databasish.

It's not clear to me that a notation designed for one of these
purposes will be particularly good at the others.

You know, I think the way to get started on something like
this is not to work from the bottom up picking notation from
elsewhere and adapting it, but to take something like Yaws
or Cowboy and try to document *that*.  Above all, it would
not be possible to avoid the scale issue, and _that_, I think,
is where a "visual" notation needs the most thought.

_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED
http://erlang.org/mailman/listinfo/erlang-questions