[erlang-questions] CBSE anybody

Wed Apr 9 10:54:43 CEST 2008

Pardon my ignorance but ...

For a long, long time I have been pondering the problem of how we glue
(software) things together.

The simplest of all methods is the unix pipe.

    cat foo | grep ... | awk ...| wc

(or whatever)

aside:
    When people say "parallel programming is difficult!" I guess they
weren't thinking of the
    unix pipe notation - the above is (or course) a parallel program
with 4 parallel processes .-)

The problem with the pipe is the low level of abstraction across the
pipe boundary - we
just send two types of thing - individual characters and endOfStream.

Erlang messaging is nicer in this respect - since we send well formed
data structures
(terms). This eliminates all the parsing and serializing of data that
is needed with raw unix pipes.

Now suppose we wish to describe a complete Erlang system in a way that
abstracts away from
the irrelevant detail of what goes on inside a process.

The "natural" way to do this is to talk in terms of protocols and in
the interconnection of components
that terminate these protocols.

Protocols can be elegantly described in some hastily constructed CSP
like notation.

Here's an example of a file server in such a notation

Protocol FileServer(in, out) {

    Start = Operation -> Start.

    Operation = Get | Put | List.

    Get = in ? {get, file()} -> out ! {file, bin()}
                                    |  out ! enofile.

    Put = in ? {put, filename::string(), data::bin()} -> out ! yes

|   out ! no

}

    and so on -- here I've taken a few liberties with the Erlang type
notation and with CSP notation - but
you should get the idea.

    This describes a black box with ports called "in" and "out".

    From such a description it's easy to write a dynamic type checker
that checks the the black box obeys the
protocol.

    How do we describe the interconnections of a set of black boxes?

    Here *diagrams* seem very helpful. I would like therefore lie to
describe the system using three levels of abstraction:

    1)  At the top level of the system I want to draw a diagram
         showing how the components are connected together
    2) At the middle layer I want to *specify* the components
    3) At the lower level I want to implement the components.

   Now it seems to me that 3) is solved very nicely in Erlang.

   What we therefore need is to focus on 1) and 2)

   I recently came across a wiki page describing "component based
software engineering"
   (see http://en.wikipedia.org/wiki/Software_componentry)

    And it struck me that a)  "this is what I've been doing all the
time only with a different name"
and b) "I've been doing this bottom up, rather than top down" (which
is why level 3, is clear in my head
but not 1 and 2)

   Reading the wiki page was interesting - it classifies Morrison's
flow based programming
(http://en.wikipedia.org/wiki/Flow-based_programming ) as a CBSE
architecture - for a long time
I've been a fan of the  FBP view of the world.

   It seems to me that the bits are beginning to click together - and
we can begin to join up the dots.

   I have noticed one area that seems to distinguish bad areas  in
projects from good - this seems to me to have to
do with whether the project terminates a protocol or not. let me give
an example.

   If I say "this software terminates the XYZ protocol as defined in
RFCxxxx" (or something) then in some
sense the projects seems easy and very doable.

   But if we say "and we need to manage the stuff" or "make it
scalable" then we run into a mass
of wiggly worms. "manage the stuff" is NOT a protocol, making is
scalable is NOT a protocol.

Things that do not have names are difficult to talk about - so the
whole idea of pattern based programming
is to identify common things and give them names.

What we do not do in Erlang is *name protocols* - they are there
implicitly but not explicitly.

If we were to say the system terminates "management protocol XXX" or
"scalability protocol YYY"
then the problem becomes easier - suddenly the vague concept of
"management" becomes a question
of protocols.

 If I look at some code with send and receive statements I cannot
"see" which particular messages
"escape the black box" and contribute to the protocol and which do no.

Some simple annotations to the code can solve this - as an example
let's go back to the file server

let's annotate the code - something like

     receive
          {a_message, X,Y} ->
                  ....
          {Pid, Bin} ->
                    event(in, "{get,  "}), %%       <- the annotation
that links the code to the protocol spec
                    term = binary_to_term(Bin),
                    ...

     end

Now when I analyze the code it corresponds to the protocol spec.

Aside:
    Interestingly when I debug a process I use io:format but my
io:format's are not placed *randomly*
    they are placed immediately *after* the receive patterns that
correspond to interactions with the external world
    and immediately *before* sends that send message to the outside world.

It strikes me that it would be highly beneficial to align the top two
levels of Erlang to
methodologies adopted in the CBSE world.

Now I cannot conceive of using WSDL to describe protocols - so I think
some kind of
CSP'ish notation would suffice. For a transport layer I am uncertain -
we could use
Erlang terms (external format) for all messaging (and a type system to
describe them,) but this
would hinder interoperability - of the currently available formats
something like JSON is not
too bad - or my UBF (see http://www.sics.se/¨~joe/ubf) . Possibly both
JASON AND UBF.

The highest level should be a drag and draw gui thingy to describe the
interconnection
between components. UML has a notation for this (it has a notation for
*everything*) which could be cannibalized.

Does anybody have experience with this kind of way of building
software - there seems to be a vast
literature - I search turned up book titles like
 -
UML Components: A Simple Process for Specifying Component-Based
Software  (Cheesman and Daniels)
...

There seem to be dozens of books of this ilk.

Can anybody recommend any books here that might enlighten me or should
I just buy a dozen or so and read them all?

What I'm after is

    1) a graphic notation showing component integration for the top
level of design.
    2) A formal notation for describing protocols for the middle level
    3) A low level way of implementing the protocol

I also want a *universal* messaging format for
interprocess-communication. Any votes for Erlang external term
format/JSON/UBF/list S expressions
/whatever.

Which bits should we invent for ourselves - and what should we
borrow/adapt/steal?

Comments please!

/Joe Armstrong