[erlang-questions] Visual Erlang notation v0.1.0 - feedback request

Tue May 6 11:16:53 CEST 2014

Hi Richard,

Richard A. O'Keefe writes:

> On 5/05/2014, at 11:11 PM, Torben Hoffmann wrote:
>> ok@REDACTED writes:
>> 
>>> I am viewing visual_erlang.pdf in Acrobat Professional
>>> and most of the figures are missing the connecting lines
>>> you'd expect to find in them.
>>> 
>> That is most peculiar - I am using Skim, but opening in the free Adobe Reader shows
>> no signs of errors.
>> 
>> Are there any log files that states anything about my PDF being bad?
>
> On my laptop, using Adobe Professional, the lines weren't there.
> On my desktop, using Preview, they are.
> Presumably a PDF version issue.
>
Let's hope that the issue is localised - otherwise I will investigate further.

> Re figure 10:  "Module my_mod has API f/N".
> Eh?  "Application *Programming* Interface" (as opposed to
> "Application *Binary* Interface) is the whole set of source-
> level accessible operations provided by something.
> "f/N" is (a name for) a SINGLE FUNCTION.
> Why use three obscure and arguably inappropriate syllables
> "eh pee ai" when two clear appropriate syllables "funk shn" will do?
> The "API" of a module (if we want to embrace that ugly and
> confusing term at all) is the *whole* export set of the module.
>
I agree. Very good point.

I wanted to have a distinction between functions that are part of the API of the
module and internal functions as well as the more abstract functionality.

But this has an easy fix: we change the textual version to "Module my_mod has
function f/N."

Should the need for internal functions arise it can be stated as "Module my_mod has
internal function f/N."

> The text says "a diamond head from the module to the function"
> but figure 10 shows a connector that I cannot perceive as
> anything but pointing from the function to the module.
> Given that modules and functions have different kinds of names
> and different kinds of boxes, it's not clear that the direction
> of the link needs to be distinguished in yet a third way.
>
I might have tried to steal too much from OPM in this case.

I think you are right, modules, processes and functions have different boxes and when
they are connected it is obvious what the meaning is.

A connection from a module to a function means that the function is defined in the
module.

A connection from a process to a function just follows the extension of the module to
the function, so same thing.

A connection from a module to a process means that the process has been started (!)
from the module.

> As I shall argue shortly, we may need to include functions
> in a diagram that are NOT exported from a module.
>
That was the intention all along - I should have made that more clear or have let you
have access to my brain, but as Erlangers we sort of frown upon shared memory, right?
;-) 

> Figure 11 says "Module my_mod has instance P1" but while
> parametric modules do have instances, those instances are
> modules, not processes.  No process is, and no process
> can be, an *instance* of a module.  On the other hand, it
> is fair to say that a module *creates* a process.  So
> some kind of
>
> 	++========++  «creates»   _______
> 	|| module || ----------> (process)
> 	++========++              ~~~~~~~
>
> makes sense.  (Note that the information flow is *from* the
> module *to* the process, so a data flow arrow goes this way
> around.)
>
Excellent! That is a way better way of stating this relationship.

The only problem with this proposal is that it is not my own idea!! 

> Amongst other things, for X to be an instance of Y, it
> has to be a special case of the *whole* of Y.  But if a
> module contains two 'spawn' calls, how can that be?
> (This is in marked contrast to classes, where an object
> *is* an instance of the *whole* class; every part of the
> class is equally relevant to it.)
>
Hmmm, I didn't expect that scenario...

> I am not arguing against showing a relationship between
> a module and a process, just saying that the kind of
> relationship isn't recognisably an "instance" relationship.
>
Totally agree. My brain still has a bit too much OPM in it - your comments will help
clear that out.

> Actually, processes are NOT created by modules,
> processes are created by functions in modules.
> So we have
>
> 	++========++  is-part-of  +----------+  «creates»   _______
> 	|| module || <----------- | function | ----------> (process)
> 	++========++              +----------+              ~~~~~~~
>
> This distinction matters when a module contains more than one
> function that calls spawn.  In this case, it is quite likely
> that the function that calls spawn will not be exported.
>
This is true, but when I reflect over how everyone I have worked with talks then
there is a very strong connection between module and the processes created by
typically one function in that module.

When I talk about processed created by a function in my_mod I do refer to them as
being my_mod's - that is the reasoning behind the notation.

But my approach totally ignores the details of what function that is used to spawn
the process and there can be cases where that is an issue.

Visual Erlang has to be natural so I will propose that we get two notations for this.

Module my_mod creates process P.  #The spawing function is abstracted.

Module my_mod has function f/N.
Function f/N creates process P.

Then the distinction can be made, but it is also possible to focus on the abstract
relationship. 

> We need to distinguish between a process p that is
> registered under the name 'p' and a process P that is
> just some process we're labelling in this diagram.
>
Agreed - I hadn't gotten to that level of detail in the first draft.

> And that means that using P1:my_mod is doubly worrisome:
> P1 is a *creation* of my_mod, not an *instance* of it,
> and my_mod is *not* a process registered under the name
> 'my_mod'.
>
It is meant as a short hand for what I have now stated to be:

Module my_mod creates process P.

Will that work? Or is there a better way to represent this relationship?

> We turn to figure 14: "Process P1 has API f/n".
> In what sense does a process "have" a function?
> Consider a process P1 spawned in module M1 which
> exports f/n, but where P1 is currently executing
> in module M2, which also export f/n.  Which f/n
> does P1 "have"?  I honestly cannot understand
> this.  When would you use it?
>
I need to explain the context a bit better, fair point.

f/N is a function in module my_mod that is used to interact with the process P1
created by the module my_mod.
This comes back to the abstract way I explained above - this is simply how we talk
about processes and "their" functions.

Without this association of functions to processes one would have to make the message
passing from the function called to the process explicit and that is cumbersome plus
against the way many talk about processes.

> Oh, figure 13 shows a ModuleProcess.  What the heck
> is a ModuleProcess?  (Sounds a bit like an instance
> of a synchronized class.)
>
:my_mod: on its own is supposed to designate a process called my_mod created by the
module my_mod.

It happens quite often that a module is implemented just to create one named process
with the same name as the module, so I wanted a short and for that.

I'm very open to different names and notation, but I want a short hand for this.

> Staring hard at figure 15 and reading the text nearby
> over and over until the cotton wool is bursting from
> my head, I *think* figure 15 is supposed to mean
>
> 	module my_mod exports f/N and g/N.
> 	It creates a process P1 by some unspecified means.
> 	If you call f/N it will ignore P1 (although this
> 	is an argument from silence).
> 	If you call g/N it will do something with P1;
> 	it might send a message, it might kill it
> 	using erlang:exit/2, it might monkey with its
> 	flags using erlang:process_flag/3, it might
> 	suspend or resume it, I'm *not saying*).
>
> Is "a call to m:g/N may produce an effect in P" what
> is meant by "P _has_ g/N"?
>
I think I have explained the bits involved in this above, so I hope that the
intention behind this is now clear.

With then new syntax:
Module my_mod creates P1.
Module my_mod has function f/N.
Process P1 has function g/N.

g/N is a function of the my_mod module that will be dealt with in the process P1.

>
>>> So for me one of the essential feature of any "boxes and
>>> connectors" notation is that there has to be a defined
>>> way to map that to some sort of set of facts that I can
>>> write queries on.  That is, it has to be more than pretty
>>> pictures.
>>> 
>>> Turtle (RDF) is fine; SWI Prolog comes with a decent kit for
>>> loading/exporting/storing/querying RDF.  Datalog is fine too.
>>> It might be OK to make it something that can be visually
>>> manipulated in 'dia', and define a mapping between dia
>>> file format and a semantic form.
>>> 
>>> Thing is, *IF* this (or any other notation) is any good at
>>> revealing just enough relevant structure, then it ought to
>>> be possible to write "flaw detectors" to spot antipatterns.
>>> 
>> I'm certainly for this kind of formalism, but I am not able to do it since my
>> knowledge in this area is non-existent.
>
> You do not need to understand any particular formalism.
> Actually, that was the point of me mentioning several.
> What you need to have is a SEMANTICS which involves
> specifying the types of concepts and the relationships
> between them.  If you know how to use a relational
> data base, *that* will do.
>
> Given that you have a diagram "process P has function f/N",
> represent that as
> 	process_function_arity (process_name, function_name, arity)
> 	                        'P',          'f',           N
>
>> Do you have pointers to some introduction texts that can help guide the design of the
>> textual notation?
>
> Ah, but I'm not talking about a textual notation!
> I'm asking for a *semantics*.
> I'm asking that a graphical notation should have
> an *abstract* equivalent as a set of facts (which 
> could be RDF triples or rows in a relational data
> base or ...) and the names and meanings of the
> predicates in this abstract equivalent should be
> spelled out.  How those facts are turned into text
> (if indeed they ever are) is comparatively unimportant.
>
> Let's take figure 15.
>
> 	module(my_mod).
> 	function('f/N', f, N).
> 	process_var('P1').
> 	function('g/N', g, N).
> 	instance('P1', my_mod).
> 	has(my_mod, 'f/N').
> 	has('P1', 'g/N').
>
> Here we have facts like "module(Atom)" -- Atom is the name of a
> module -- and "process_var(Atom)" -- Atom is a variable name
> local to the current diagram which stands for a process and
> "function(Atom1, Atom2, Integer)" -- Atom1 is an arbitrary
> atom labelling a function box, Atom2 is the name of that
> function, and Arity is its arity" and "instance(Atom1, Atom2)"
> -- Atom1 is either a process node name or a process_var node
> name and Atom2 is a module name and that module creates that
> process-- and so on.
>
> It is quite unimportant whether it's written like that,
> or like
> 	(define-node '|my_mod| :type module)
> 	(define-node '|f/N|    :type function)
> 	(define-node '|g/N|    :type function)
> 	(define-node '|P1|     :type process-var)
> 	(instance '|P1| '|my_mod|)
> 	(has '|my_mod| '|f/N|)
> 	(has '|P1|     '|g/N|)
> or like
> 	<ve>
> 	  <module id="my_mod"/>
> 	  <function id="f/N"/>
> 	  <function id="g/N"/>
> 	  <process-var id="P1"/>
> 	  <instance id="P1" of="my_mod"/>
> 	  <has id="my_mod" funs="f/N"/>
> 	  <has id="P1" funs="g/N"/>
> 	</ve>
>
> Given any of these, it's trivial to produce the other two.
> What matters is the *semantics* these express.
>
> A human-readable textual notation is something else again,
> satisfying different design criteria.
I get your point. It makes sense.

I will try to add a semantic layer as well.
In fact, it is probably best to define that first and then decide if a textual
notation is needed.

>> 
>> What I have been missing, and what drives me to create Visual Erlang, is a way to
>> capture how the supervision tree interacts with the functionality of the
>> module/processes.
>
> An example that illustrates precisely that would be create.
It will be added - either in this document or in the Erlang Concurrency Patterns.

>> 
>> Apart from being able to talk about what a process should be responsible for I am
>> also aiming at getting to a point where I can interactively browse through a code
>> base and uncover the architecture, while deciding which parts to hide through
>> abstractions. That is a bit down the road, though.
>
> I have been day-dreaming about a paper for an Erlang conference
> with the working title "Why can't I see the structure?".
>
> Let me briefly mention some annotations I've started using
> with Smalltalk:
>
> 	<compatibility: #DIALECT>
>
> 	This method exists for compatibility with DIALECT.
> 	Don't blame *ME* for the interface.
> 	For reasons and examples, read their documentation.
>
> 	<specialCaseOf: #SELECTOR>
>
> 	This method isn't really necessary; you could get
> 	the same result by using #SELECTOR but with more
> 	effort.  If you're trying to understand this class,
> 	ignore this method for now.
>
> 	<compositionOf: #SELECTOR and: #SELECTOR2
> 		[for: #time|#space|#safety]>
>
> 	This method gets the same effect that combining
> 	SELECTOR and SELECTOR2 in the obvious way would.
> 	It might save time, save memory, or be legal when
> 	for technical reasons the separate parts wouldn't be.
> 	If you are trying to understand this class,
> 	ignore this method for now and look at the other two.
>
> 	<supportFor: #SELECTOR>
>
> 	This method isn't meant for normal use; it exists
> 	to support SELECTOR.  If you want to understand
> 	this method, look at SELECTOR first.
>
> This is in addition to the grouping of methods into "categories"
> that is traditional in Smalltalk.
>
> When I look at a function in a module, I need to know
>
> 	WHAT	does this function do?
> 	HOW	does it do that?
> 	WHY	does it exist?
>
> The thing that is systematically missing from things like UML
> is the WHY.  I find that while UML _could_ tell me a great
> deal, in practice it _doesn't_ tell me anything I couldn't
> get from a cross-referencer.
>
> So if I am staring at a diagram showing a bunch of processes,
> I want to know WHY they are there.  If there are two processes
> with a third managing them, is work being sent to them both,
> is one a hot standby for the other, are they using different
> algorithms so that one is serving as a cross-check for another,
> is one a high security version and one a low security version,
> or what?
>
> Arguably, when I know WHY processes exist, I should be able
> to figure out *whether* they should be linked.
>
That is a nice way of documenting it in the code.

For my own needs I still desire to have an architectural diagram of a system to
better understand the interactions. One still has to describe the WHY - that never
goes away, not even with a pretty diagram.

>> Funny that you bring this up - there has just been an application for an EU grant
>> (CoCo) to look at communication contracts between processes, which is very much about
>> building on top of UBF and get something that captures the protocols in the same way
>> specs capture types today.
>> 
>> I'd love to get protocols covered as well, but I don't feel that squeezing them into
>> Visual Erlang is the right approach.
>
> Look, instead of
>
> 	(Process_Var : module_name)
>
> the only change you need is to use
>
> 	(Process_Var : protocol_name)
>
> instead.  Protocols really *are* "types for processes".
> The bulk of the information about the protocol could be in some
> other document, but *some* identification of the protocol
> belongs in the process box.
>
Hmmmm, that's an interesting way of approaching it.

I have to think about how to fit this with the more structural view that Visual
Erlang has right now.
Both has its place, so I'd like to cover both.
And your remark about the protocol to be in some other document actually lends itself
quite nicely to getting both aspects integrated.

> There's arguably something else.
>
> UBF has the notion of a process being in a *state*.
> For that matter, Microsoft's Sing# language for the
> Singularity operating system has precisely the same
> notation built into its core syntax for channels.
>
> It isn't just instances of gen_fsm that have states.
>
>
> Maybe you could have
>
>           (process)
>          /         \
> <|state|>           <|state|>
>
> or something.  The way OPM puts states inside objects is a royal
> road to unreadability for non-trivial diagrams.
>
I have that in my paper notes - didn't get around to put them into the document yet.

There will be no putting state inside processes!

>> I have stolen some of the ideas from Object-Process Methodology (OPM).
>
> Hmm.  The only thing I find more off-putting than "Holistic"
> in a book title is "Wholistic".
>
> If you thought OPM was a good starting point, then the idea of
> displaying UBF states in an Erlang notation should appeal.
>
Indeed - have shown that in hand written diagrams at conferences already.

>> Hopefully the CoCo project will shed some light on this.
>
> Well, there's a pretty strong glimmer right here.
>
> In UML, you have |instance name: class name|.
> The class name is *NOT* the name of whatever it is
> that creates the instance and it is NOT the name
> of the package that contains the 'new' expression.
> The logical application of that to Erlang is
> (process name: protocol name) as suggested above.
> protocol : process :: interface : object.
>
That does not look too bad - will keep it in mind.

> I don't suppose the CoCo project want a long-distance collaborator (:-).
>
Let's get the EU to approve the project first!

But I, for one, would not object to having long-distance collaborators like you
involved. I'm pretty sure the rest of the consortium is on the same page.

> There are at least three purposes a graphical notation may serve:
>
> (1) Design.
>
>     For this, it is important that it should be very easy to
>     *change* a diagram and there should be good tools to *check* it.
>
> (2) Code generation.
>
>     Given a sufficiently detailed set of diagrams, you might
>     want to generate (skeletal) code.  This absolutely requires
>     that the notation SCALE both up (to 100,000 line systems at
>     least) and down.
>
> (3) Documentation.
>
>     What this requires above all is readability.
>
>     A central limitation on readability is that visual
>     diagrams just *cannot* hold very much information.
>     And *that* means that you have to have a multitude
>     of overlapping diagrams and need good tools
>     for flicking through dozens if not hundreds of
>     diagrams, being able to specify what to look at by
>     something information retrieval-ish and by
>     something databasish.
>
> It's not clear to me that a notation designed for one of these
> purposes will be particularly good at the others.
>
> You know, I think the way to get started on something like
> this is not to work from the bottom up picking notation from
> elsewhere and adapting it, but to take something like Yaws
> or Cowboy and try to document *that*.  Above all, it would
> not be possible to avoid the scale issue, and _that_, I think,
> is where a "visual" notation needs the most thought.
It may come across as a bottom-up approach, but Jesper and I started out using OPM to
describe Erlang Concurrency Patterns, which sort of worked, but then I was strongly
advised to invent Visual Erlang by someone who had been down the OPM route with
Erlang.

So Visual Erlang is born out of a need to be able to express these concurrency
patterns.
I admit, that it is not something at scale, so there is a to-do on documenting
something bigger before we reach v1.0.0.

Right now I just want to close what to me is an obvious void on the architectural
level and then take the next problem after that.

Hope this makes sense.

Thanks a lot for the very constructive and valuable feed-back - it is appreciated and
will be used!

Cheers,
Torben
-- 
Torben Hoffmann
CTO
Erlang Solutions Ltd.
Tel: +45 25 14 05 38
http://www.erlang-solutions.com